# Assignment2: RISC-V Toolchain contributed by < [`Yao1201`](https://github.com/Yao1201) > ## Environment Setting The setup for the virtual machine and Ubuntu Linux environment is well-documented with clear explanations and step-by-step instructions on [Lab2: RISC-V RV32I[MACF] emulator with ELF support](https://hackmd.io/@sysprog/SJAR5XMmi). In addition to that, it also provides instructions on using the GNU Toolchain . ## Question selection I choose the [Calculate the Hamming Distance using Counting Leading Zeros](https://hackmd.io/@O6C2C3zQQBanDM55QRZ7DQ/Lab1_RV32I_assembly) from [author_YP](https://hackmd.io/@O6C2C3zQQBanDM55QRZ7DQ) ## Motivation The reason I chose this topic is that in communication systems, Hamming distance is a crucial piece of information during the decoding stage. By identifying the shortest Hamming distance, the bit error rate in receiving signals can be minimized. In other words, it maximizes the probability of receiving the correct signal. ## Compile C code using RISC-V gcc ### Original C code from [author_YP](https://hackmd.io/@O6C2C3zQQBanDM55QRZ7DQ) ```c #include <stdio.h> #include <stdint.h> uint64_t test1_x0 = 0x0000000000100000; uint64_t test1_x1 = 0x00000000000FFFFF; uint64_t test2_x0 = 0x0000000000000001; uint64_t test2_x1 = 0x7FFFFFFFFFFFFFFE; uint64_t test3_x0 = 0x000000028370228F; uint64_t test3_x1 = 0x000000028370228F; uint16_t count_leading_zeros(uint64_t x){ x |= (x >> 1); x |= (x >> 2); x |= (x >> 4); x |= (x >> 8); x |= (x >> 16); x |= (x >> 32); /* count ones (population count) */ x -= ((x >> 1) & 0x5555555555555555); x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333); x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f; x += (x >> 8); x += (x >> 16); x += (x >> 32); return (64 - (x & 0x7f)); } int HammingDistance(uint64_t x0, uint64_t x1){ int Hdist = 0; int16_t max_digit = 64 - (int16_t)count_leading_zeros((x0 > x1)? x0 : x1); while(max_digit > 0){ uint64_t c1 = x0 & 1; uint64_t c2 = x1 & 1; if(c1 != c2) Hdist += 1; x0 = x0 >> 1; x1 = x1 >> 1; max_digit -= 1; } return Hdist; } int main(){ printf("%d\n", HammingDistance(test1_x0, test1_x1)); printf("%d\n", HammingDistance(test2_x0, test2_x1)); printf("%d\n", HammingDistance(test3_x0, test3_x1)); return 0; } ``` In order to obtain the number of cylces, I added a `get_cycles` function to the main function in my C code: ```diff int main(){ + uint64_t instret = get_instret(); + uint64_t oldcount = get_cycles(); printf("Hamming Distance = %d\n", HammingDistance(test1_x0, test1_x1)); printf("Hamming Distance = %d\n", HammingDistance(test2_x0, test2_x1)); printf("Hamming Distance = %d\n", HammingDistance(test3_x0, test3_x1)); + uint64_t cyclecount = get_cycles() - oldcount; + printf("cycle count: %u\n", (unsigned int) cyclecount); } ``` `get_cycles` is shown as: ```c .text .globl get_cycles .align 2 get_cycles: csrr a1, cycleh csrr a0, cycle csrr a2, cycleh bne a1, a2, get_cycles ret .size get_cycles,.-get_cycles ``` ### Makefile To facilitate compilation, we can write a Makefile: ```shell .PHONY: clean include ../../mk/toolchain.mk ASFLAGS = -march=rv32i_zicsr -mabi=ilp32 CFLAGS = -O1 -Wall OBJS =\ Hammingdist.o\ getcycles.o\ getinstret.o BIN = Hammingdist.elf %.o: %.s riscv-none-elf-gcc $(ASFLAGS) -c -o $@ $< %.o: %.c riscv-none-elf-gcc $(ASFLAGS) $(CFLAGS) -c -o $@ $< all: Hammingdist.elf Hammingdist.elf: $(OBJS) riscv-none-elf-gcc -o $@ $^ clean: $(RM) $(BIN) $(OBJS ``` * Modify `CFLAGS` according to optimized level (O0, O1, O2, O3, Os, Ofast) * Because we use the `CSR` command, we need to add `_zicsr` in `ASFLAGS` additionally. After the aforementioned modifications, we finally ready to start compiling our C file. ### O0 Optimized No optimizations are applied. These is the default compilation options. ```c1! $ ../../build/rv32emu Hammingdist.elf Hamming Distance = 21 Hamming Distance = 63 Hamming Distance = 0 cycle count: 11190 instret: 2ca inferior exit code 0 $ riscv-none-elf-readelf -h Hammingdist.elf ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x100c4 Start of program headers: 52 (bytes into file) Start of section headers: 69644 (bytes into file) Flags: 0x1, RVC, soft-float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 15 Section header string table index: 14 $ riscv-none-elf-size Hammingdist.elf text data bss dec hex filename 52906 1928 1528 56362 dc2a Hammingdist.elf ``` In the Disassembled code, I primarily choose to analyze function `CLZ` and `Hammingdistance` because these two sections play the most critical roles in the program. :::spoiler **Disassembled** ```c 0010148 <count_leading_zeros>: 10148: f7010113 add sp,sp,-144 1014c: 08812623 sw s0,140(sp) 10150: 09212423 sw s2,136(sp) 10154: 09312223 sw s3,132(sp) 10158: 09412023 sw s4,128(sp) 1015c: 07512e23 sw s5,124(sp) 10160: 07612c23 sw s6,120(sp) 10164: 07712a23 sw s7,116(sp) 10168: 07812823 sw s8,112(sp) 1016c: 07912623 sw s9,108(sp) 10170: 07a12423 sw s10,104(sp) 10174: 07b12223 sw s11,100(sp) 10178: 09010413 add s0,sp,144 1017c: fca42423 sw a0,-56(s0) 10180: fcb42623 sw a1,-52(s0) 10184: fcc42783 lw a5,-52(s0) 10188: 01f79713 sll a4,a5,0x1f 1018c: fc842783 lw a5,-56(s0) 10190: 0017d613 srl a2,a5,0x1 10194: 00c76633 or a2,a4,a2 10198: fcc42783 lw a5,-52(s0) 1019c: 0017d693 srl a3,a5,0x1 101a0: fc842783 lw a5,-56(s0) 101a4: 00c7ee33 or t3,a5,a2 101a8: fcc42783 lw a5,-52(s0) 101ac: 00d7eeb3 or t4,a5,a3 101b0: fdc42423 sw t3,-56(s0) 101b4: fdd42623 sw t4,-52(s0) 101b8: fcc42783 lw a5,-52(s0) 101bc: 01e79713 sll a4,a5,0x1e 101c0: fc842783 lw a5,-56(s0) 101c4: 0027d813 srl a6,a5,0x2 101c8: 01076833 or a6,a4,a6 101cc: fcc42783 lw a5,-52(s0) 101d0: 0027d893 srl a7,a5,0x2 101d4: fc842783 lw a5,-56(s0) 101d8: 0107e7b3 or a5,a5,a6 101dc: f8f42823 sw a5,-112(s0) 101e0: fcc42783 lw a5,-52(s0) 101e4: 0117e7b3 or a5,a5,a7 101e8: f8f42a23 sw a5,-108(s0) 101ec: f9042783 lw a5,-112(s0) 101f0: f9442803 lw a6,-108(s0) 101f4: fcf42423 sw a5,-56(s0) 101f8: fd042623 sw a6,-52(s0) 101fc: fcc42783 lw a5,-52(s0) 10200: 01c79713 sll a4,a5,0x1c 10204: fc842783 lw a5,-56(s0) 10208: 0047d313 srl t1,a5,0x4 1020c: 00676333 or t1,a4,t1 10210: fcc42783 lw a5,-52(s0) 10214: 0047d393 srl t2,a5,0x4 10218: fc842783 lw a5,-56(s0) 1021c: 0067e7b3 or a5,a5,t1 10220: f8f42423 sw a5,-120(s0) 10224: fcc42783 lw a5,-52(s0) 10228: 0077e7b3 or a5,a5,t2 1022c: f8f42623 sw a5,-116(s0) 10230: f8842783 lw a5,-120(s0) 10234: f8c42803 lw a6,-116(s0) 10238: fcf42423 sw a5,-56(s0) 1023c: fd042623 sw a6,-52(s0) 10240: fcc42783 lw a5,-52(s0) 10244: 01879713 sll a4,a5,0x18 10248: fc842783 lw a5,-56(s0) 1024c: 0087d913 srl s2,a5,0x8 10250: 01276933 or s2,a4,s2 10254: fcc42783 lw a5,-52(s0) 10258: 0087d993 srl s3,a5,0x8 1025c: fc842783 lw a5,-56(s0) 10260: 0127e7b3 or a5,a5,s2 10264: f8f42023 sw a5,-128(s0) 10268: fcc42783 lw a5,-52(s0) 1026c: 0137e7b3 or a5,a5,s3 10270: f8f42223 sw a5,-124(s0) 10274: f8042783 lw a5,-128(s0) 10278: f8442803 lw a6,-124(s0) 1027c: fcf42423 sw a5,-56(s0) 10280: fd042623 sw a6,-52(s0) 10284: fcc42783 lw a5,-52(s0) 10288: 01079713 sll a4,a5,0x10 1028c: fc842783 lw a5,-56(s0) 10290: 0107d793 srl a5,a5,0x10 10294: fcf42023 sw a5,-64(s0) 10298: fc042783 lw a5,-64(s0) 1029c: 00f767b3 or a5,a4,a5 102a0: fcf42023 sw a5,-64(s0) 102a4: fcc42783 lw a5,-52(s0) 102a8: 0107d793 srl a5,a5,0x10 102ac: fcf42223 sw a5,-60(s0) 102b0: fc842783 lw a5,-56(s0) 102b4: fc042603 lw a2,-64(s0) 102b8: fc442683 lw a3,-60(s0) 102bc: 00060713 mv a4,a2 102c0: 00e7e7b3 or a5,a5,a4 102c4: f6f42c23 sw a5,-136(s0) 102c8: fcc42783 lw a5,-52(s0) 102cc: 00068713 mv a4,a3 102d0: 00e7e7b3 or a5,a5,a4 102d4: f6f42e23 sw a5,-132(s0) 102d8: f7842783 lw a5,-136(s0) 102dc: f7c42803 lw a6,-132(s0) 102e0: fcf42423 sw a5,-56(s0) 102e4: fd042623 sw a6,-52(s0) 102e8: fcc42783 lw a5,-52(s0) 102ec: 0007d793 srl a5,a5,0x0 102f0: faf42c23 sw a5,-72(s0) 102f4: fa042e23 sw zero,-68(s0) 102f8: fc842783 lw a5,-56(s0) 102fc: fb842603 lw a2,-72(s0) 10300: fbc42683 lw a3,-68(s0) 10304: 00060713 mv a4,a2 10308: 00e7e7b3 or a5,a5,a4 1030c: f6f42823 sw a5,-144(s0) 10310: fcc42783 lw a5,-52(s0) 10314: 00068713 mv a4,a3 10318: 00e7e7b3 or a5,a5,a4 1031c: f6f42a23 sw a5,-140(s0) 10320: f7042783 lw a5,-144(s0) 10324: f7442803 lw a6,-140(s0) 10328: fcf42423 sw a5,-56(s0) 1032c: fd042623 sw a6,-52(s0) 10330: fcc42783 lw a5,-52(s0) 10334: 01f79793 sll a5,a5,0x1f 10338: fc842703 lw a4,-56(s0) 1033c: 00175d13 srl s10,a4,0x1 10340: 01a7ed33 or s10,a5,s10 10344: fcc42783 lw a5,-52(s0) 10348: 0017dd93 srl s11,a5,0x1 1034c: 555557b7 lui a5,0x55555 10350: 55578793 add a5,a5,1365 # 55555555 <__BSS_END__+0x555377d1> 10354: 00fd77b3 and a5,s10,a5 10358: faf42823 sw a5,-80(s0) 1035c: 555557b7 lui a5,0x55555 10360: 55578793 add a5,a5,1365 # 55555555 <__BSS_END__+0x555377d1> 10364: 00fdf7b3 and a5,s11,a5 10368: faf42a23 sw a5,-76(s0) 1036c: fc842603 lw a2,-56(s0) 10370: fcc42683 lw a3,-52(s0) 10374: fb042803 lw a6,-80(s0) 10378: fb442883 lw a7,-76(s0) 1037c: 00080593 mv a1,a6 10380: 40b60733 sub a4,a2,a1 10384: 00070593 mv a1,a4 10388: 00b635b3 sltu a1,a2,a1 1038c: 00088513 mv a0,a7 10390: 40a687b3 sub a5,a3,a0 10394: 40b786b3 sub a3,a5,a1 10398: 00068793 mv a5,a3 1039c: fce42423 sw a4,-56(s0) 103a0: fcf42623 sw a5,-52(s0) 103a4: fcc42783 lw a5,-52(s0) 103a8: 01e79793 sll a5,a5,0x1e 103ac: fc842703 lw a4,-56(s0) 103b0: 00275c13 srl s8,a4,0x2 103b4: 0187ec33 or s8,a5,s8 103b8: fcc42783 lw a5,-52(s0) 103bc: 0027dc93 srl s9,a5,0x2 103c0: 333337b7 lui a5,0x33333 103c4: 33378793 add a5,a5,819 # 33333333 <__BSS_END__+0x333155af> 103c8: 00fc77b3 and a5,s8,a5 103cc: faf42423 sw a5,-88(s0) 103d0: 333337b7 lui a5,0x33333 103d4: 33378793 add a5,a5,819 # 33333333 <__BSS_END__+0x333155af> 103d8: 00fcf7b3 and a5,s9,a5 103dc: faf42623 sw a5,-84(s0) 103e0: fc842703 lw a4,-56(s0) 103e4: 333337b7 lui a5,0x33333 103e8: 33378793 add a5,a5,819 # 33333333 <__BSS_END__+0x333155af> 103ec: 00f777b3 and a5,a4,a5 103f0: faf42023 sw a5,-96(s0) 103f4: fcc42703 lw a4,-52(s0) 103f8: 333337b7 lui a5,0x33333 103fc: 33378793 add a5,a5,819 # 33333333 <__BSS_END__+0x333155af> 10400: 00f777b3 and a5,a4,a5 10404: faf42223 sw a5,-92(s0) 10408: fa842503 lw a0,-88(s0) 1040c: fac42583 lw a1,-84(s0) 10410: 00050693 mv a3,a0 10414: fa042803 lw a6,-96(s0) 10418: fa442883 lw a7,-92(s0) 1041c: 00080613 mv a2,a6 10420: 00c68733 add a4,a3,a2 10424: 00070693 mv a3,a4 10428: 00050613 mv a2,a0 1042c: 00c6b6b3 sltu a3,a3,a2 10430: 00058613 mv a2,a1 10434: 00088593 mv a1,a7 10438: 00b607b3 add a5,a2,a1 1043c: 00f686b3 add a3,a3,a5 10440: 00068793 mv a5,a3 10444: fce42423 sw a4,-56(s0) 10448: fcf42623 sw a5,-52(s0) 1044c: fcc42783 lw a5,-52(s0) 10450: 01c79793 sll a5,a5,0x1c 10454: fc842703 lw a4,-56(s0) 10458: 00475f13 srl t5,a4,0x4 1045c: 01e7ef33 or t5,a5,t5 10460: fcc42783 lw a5,-52(s0) 10464: 0047df93 srl t6,a5,0x4 10468: fc842603 lw a2,-56(s0) 1046c: fcc42683 lw a3,-52(s0) 10470: 00cf0733 add a4,t5,a2 10474: 00070593 mv a1,a4 10478: 01e5b5b3 sltu a1,a1,t5 1047c: 00df87b3 add a5,t6,a3 10480: 00f586b3 add a3,a1,a5 10484: 00068793 mv a5,a3 10488: 0f0f16b7 lui a3,0xf0f1 1048c: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d318b> 10490: 00d776b3 and a3,a4,a3 10494: fcd42423 sw a3,-56(s0) 10498: 0f0f16b7 lui a3,0xf0f1 1049c: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d318b> 104a0: 00d7f7b3 and a5,a5,a3 104a4: fcf42623 sw a5,-52(s0) 104a8: fcc42783 lw a5,-52(s0) 104ac: 01879793 sll a5,a5,0x18 104b0: fc842703 lw a4,-56(s0) 104b4: 00875b13 srl s6,a4,0x8 104b8: 0167eb33 or s6,a5,s6 104bc: fcc42783 lw a5,-52(s0) 104c0: 0087db93 srl s7,a5,0x8 104c4: fc842603 lw a2,-56(s0) 104c8: fcc42683 lw a3,-52(s0) 104cc: 01660733 add a4,a2,s6 104d0: 00070593 mv a1,a4 104d4: 00c5b5b3 sltu a1,a1,a2 104d8: 017687b3 add a5,a3,s7 104dc: 00f586b3 add a3,a1,a5 104e0: 00068793 mv a5,a3 104e4: fce42423 sw a4,-56(s0) 104e8: fcf42623 sw a5,-52(s0) 104ec: fcc42783 lw a5,-52(s0) 104f0: 01079793 sll a5,a5,0x10 104f4: fc842703 lw a4,-56(s0) 104f8: 01075a13 srl s4,a4,0x10 104fc: 0147ea33 or s4,a5,s4 10500: fcc42783 lw a5,-52(s0) 10504: 0107da93 srl s5,a5,0x10 10508: fc842603 lw a2,-56(s0) 1050c: fcc42683 lw a3,-52(s0) 10510: 01460733 add a4,a2,s4 10514: 00070593 mv a1,a4 10518: 00c5b5b3 sltu a1,a1,a2 1051c: 015687b3 add a5,a3,s5 10520: 00f586b3 add a3,a1,a5 10524: 00068793 mv a5,a3 10528: fce42423 sw a4,-56(s0) 1052c: fcf42623 sw a5,-52(s0) 10530: fcc42783 lw a5,-52(s0) 10534: 0007d793 srl a5,a5,0x0 10538: f8f42c23 sw a5,-104(s0) 1053c: f8042e23 sw zero,-100(s0) 10540: fc842603 lw a2,-56(s0) 10544: fcc42683 lw a3,-52(s0) 10548: f9842803 lw a6,-104(s0) 1054c: f9c42883 lw a7,-100(s0) 10550: 00080593 mv a1,a6 10554: 00b60733 add a4,a2,a1 10558: 00070593 mv a1,a4 1055c: 00c5b5b3 sltu a1,a1,a2 10560: 00088513 mv a0,a7 10564: 00a687b3 add a5,a3,a0 10568: 00f586b3 add a3,a1,a5 1056c: 00068793 mv a5,a3 10570: fce42423 sw a4,-56(s0) 10574: fcf42623 sw a5,-52(s0) 10578: fc845783 lhu a5,-56(s0) 1057c: 07f7f793 and a5,a5,127 10580: 01079793 sll a5,a5,0x10 10584: 0107d793 srl a5,a5,0x10 10588: 04000713 li a4,64 1058c: 40f707b3 sub a5,a4,a5 10590: 01079793 sll a5,a5,0x10 10594: 0107d793 srl a5,a5,0x10 10598: 00078513 mv a0,a5 1059c: 08c12403 lw s0,140(sp) 105a0: 08812903 lw s2,136(sp) 105a4: 08412983 lw s3,132(sp) 105a8: 08012a03 lw s4,128(sp) 105ac: 07c12a83 lw s5,124(sp) 105b0: 07812b03 lw s6,120(sp) 105b4: 07412b83 lw s7,116(sp) 105b8: 07012c03 lw s8,112(sp) 105bc: 06c12c83 lw s9,108(sp) 105c0: 06812d03 lw s10,104(sp) 105c4: 06412d83 lw s11,100(sp) 105c8: 09010113 add sp,sp,144 105cc: 00008067 ret 000105d0 <HammingDistance>: 105d0: fa010113 add sp,sp,-96 105d4: 04112e23 sw ra,92(sp) 105d8: 04812c23 sw s0,88(sp) 105dc: 05212a23 sw s2,84(sp) 105e0: 05312823 sw s3,80(sp) 105e4: 05412623 sw s4,76(sp) 105e8: 05512423 sw s5,72(sp) 105ec: 05612223 sw s6,68(sp) 105f0: 05712023 sw s7,64(sp) 105f4: 03812e23 sw s8,60(sp) 105f8: 03912c23 sw s9,56(sp) 105fc: 06010413 add s0,sp,96 10600: faa42423 sw a0,-88(s0) 10604: fab42623 sw a1,-84(s0) 10608: fac42023 sw a2,-96(s0) 1060c: fad42223 sw a3,-92(s0) 10610: fc042623 sw zero,-52(s0) 10614: fa842603 lw a2,-88(s0) 10618: fac42683 lw a3,-84(s0) 1061c: fa042703 lw a4,-96(s0) 10620: fa442783 lw a5,-92(s0) 10624: 00068513 mv a0,a3 10628: 00078593 mv a1,a5 1062c: 00a5ee63 bltu a1,a0,10648 <HammingDistance+0x78> 10630: 00068513 mv a0,a3 10634: 00078593 mv a1,a5 10638: 00b51c63 bne a0,a1,10650 <HammingDistance+0x80> 1063c: 00060513 mv a0,a2 10640: 00070593 mv a1,a4 10644: 00a5f663 bgeu a1,a0,10650 <HammingDistance+0x80> 10648: 00060713 mv a4,a2 1064c: 00068793 mv a5,a3 10650: 00070513 mv a0,a4 10654: 00078593 mv a1,a5 10658: af1ff0ef jal 10148 <count_leading_zeros> 1065c: 00050793 mv a5,a0 10660: 00078713 mv a4,a5 10664: 04000793 li a5,64 10668: 40e787b3 sub a5,a5,a4 1066c: 01079793 sll a5,a5,0x10 10670: 0107d793 srl a5,a5,0x10 10674: fcf41523 sh a5,-54(s0) 10678: 0b40006f j 1072c <HammingDistance+0x15c> 1067c: fa842783 lw a5,-88(s0) 10680: 0017fb13 and s6,a5,1 10684: fac42783 lw a5,-84(s0) 10688: 0007fb93 and s7,a5,0 1068c: fd642023 sw s6,-64(s0) 10690: fd742223 sw s7,-60(s0) 10694: fa042783 lw a5,-96(s0) 10698: 0017fc13 and s8,a5,1 1069c: fa442783 lw a5,-92(s0) 106a0: 0007fc93 and s9,a5,0 106a4: fb842c23 sw s8,-72(s0) 106a8: fb942e23 sw s9,-68(s0) 106ac: fc042703 lw a4,-64(s0) 106b0: fb842783 lw a5,-72(s0) 106b4: 00f71863 bne a4,a5,106c4 <HammingDistance+0xf4> 106b8: fc442703 lw a4,-60(s0) 106bc: fbc42783 lw a5,-68(s0) 106c0: 00f70863 beq a4,a5,106d0 <HammingDistance+0x100> 106c4: fcc42783 lw a5,-52(s0) 106c8: 00178793 add a5,a5,1 106cc: fcf42623 sw a5,-52(s0) 106d0: fac42783 lw a5,-84(s0) 106d4: 01f79793 sll a5,a5,0x1f 106d8: fa842703 lw a4,-88(s0) 106dc: 00175913 srl s2,a4,0x1 106e0: 0127e933 or s2,a5,s2 106e4: fac42783 lw a5,-84(s0) 106e8: 0017d993 srl s3,a5,0x1 106ec: fb242423 sw s2,-88(s0) 106f0: fb342623 sw s3,-84(s0) 106f4: fa442783 lw a5,-92(s0) 106f8: 01f79793 sll a5,a5,0x1f 106fc: fa042703 lw a4,-96(s0) 10700: 00175a13 srl s4,a4,0x1 10704: 0147ea33 or s4,a5,s4 10708: fa442783 lw a5,-92(s0) 1070c: 0017da93 srl s5,a5,0x1 10710: fb442023 sw s4,-96(s0) 10714: fb542223 sw s5,-92(s0) 10718: fca45783 lhu a5,-54(s0) 1071c: fff78793 add a5,a5,-1 10720: 01079793 sll a5,a5,0x10 10724: 0107d793 srl a5,a5,0x10 10728: fcf41523 sh a5,-54(s0) 1072c: fca41783 lh a5,-54(s0) 10730: f4f046e3 bgtz a5,1067c <HammingDistance+0xac> 10734: fcc42783 lw a5,-52(s0) 10738: 00078513 mv a0,a5 1073c: 05c12083 lw ra,92(sp) 10740: 05812403 lw s0,88(sp) 10744: 05412903 lw s2,84(sp) 10748: 05012983 lw s3,80(sp) 1074c: 04c12a03 lw s4,76(sp) 10750: 04812a83 lw s5,72(sp) 10754: 04412b03 lw s6,68(sp) 10758: 04012b83 lw s7,64(sp) 1075c: 03c12c03 lw s8,60(sp) 10760: 03812c83 lw s9,56(sp) 10764: 06010113 add sp,sp,96 10768: 00008067 ret ``` ::: #### Observation * Register Usage | Register | CLZ | Hammingdistnce | |:--------:|:--------------:|:--------------:| | a-type | 7 (a0-a6) | 6(a0-a5) | | s-type | 11(s1, s2-s11) | 9(s0, s2-s9) | | t-type | 2(t5, t6) | 0 | | other | sp | ra, sp | * Stack Usage | Stack | CLZ | Hammingdistance | | ----- | --- |:---------------:| | Usage | 144 | 96 | * Lines of code/`lw`/`sw` | Line of code | lw | sw | |:------------:| --- | --- | | 371 | 120 | 84 | The code has 371 lines. It's too lengthy and it has a bunch of `lw`/`sw`. Besides, a lot of `mv`/`lw`/`sw` are redundantm, resulting in low efficiency code. ### O1 Optimized The compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time. ```shell $ ../../build/rv32emu Hammingdist.elf Hamming Distance = 21 Hamming Distance = 63 Hamming Distance = 0 cycle count: 7198 instret: 2cb inferior exit code 0 $ riscv-none-elf-readelf -h Hammingdist.elf ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x100c2 Start of program headers: 52 (bytes into file) Start of section headers: 69540 (bytes into file) Flags: 0x1, RVC, soft-float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 15 Section header string table index: 14 $ riscv-none-elf-size Hammingdist.elf text data bss dec hex filename 51790 1928 1528 55246 d7ce Hammingdist.elf ``` :::spoiler **Disassembled** ```c 00010144 <count_leading_zeros>: 10144: 01f59713 sll a4,a1,0x1f 10148: 00155793 srl a5,a0,0x1 1014c: 00f767b3 or a5,a4,a5 10150: 0015d713 srl a4,a1,0x1 10154: 00a7e533 or a0,a5,a0 10158: 00b765b3 or a1,a4,a1 1015c: 01e59713 sll a4,a1,0x1e 10160: 00255793 srl a5,a0,0x2 10164: 00f767b3 or a5,a4,a5 10168: 0025d613 srl a2,a1,0x2 1016c: 00a7e533 or a0,a5,a0 10170: 00b66633 or a2,a2,a1 10174: 01c61713 sll a4,a2,0x1c 10178: 00455793 srl a5,a0,0x4 1017c: 00f767b3 or a5,a4,a5 10180: 00465693 srl a3,a2,0x4 10184: 00a7e733 or a4,a5,a0 10188: 00c6e6b3 or a3,a3,a2 1018c: 01869613 sll a2,a3,0x18 10190: 00875793 srl a5,a4,0x8 10194: 00f667b3 or a5,a2,a5 10198: 0086d613 srl a2,a3,0x8 1019c: 00e7e7b3 or a5,a5,a4 101a0: 00d66633 or a2,a2,a3 101a4: 01061713 sll a4,a2,0x10 101a8: 0107d693 srl a3,a5,0x10 101ac: 00d766b3 or a3,a4,a3 101b0: 01065713 srl a4,a2,0x10 101b4: 00f6e6b3 or a3,a3,a5 101b8: 00c76733 or a4,a4,a2 101bc: 00d766b3 or a3,a4,a3 101c0: 01f71613 sll a2,a4,0x1f 101c4: 0016d793 srl a5,a3,0x1 101c8: 00f667b3 or a5,a2,a5 101cc: 00175593 srl a1,a4,0x1 101d0: 55555637 lui a2,0x55555 101d4: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x555377d1> 101d8: 00c7f7b3 and a5,a5,a2 101dc: 00c5f633 and a2,a1,a2 101e0: 40f687b3 sub a5,a3,a5 101e4: 00f6b6b3 sltu a3,a3,a5 101e8: 40c70733 sub a4,a4,a2 101ec: 40d70733 sub a4,a4,a3 101f0: 01e71613 sll a2,a4,0x1e 101f4: 0027d693 srl a3,a5,0x2 101f8: 00d666b3 or a3,a2,a3 101fc: 00275593 srl a1,a4,0x2 10200: 33333637 lui a2,0x33333 10204: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155af> 10208: 00c6f6b3 and a3,a3,a2 1020c: 00c5f5b3 and a1,a1,a2 10210: 00c7f7b3 and a5,a5,a2 10214: 00c77733 and a4,a4,a2 10218: 00f687b3 add a5,a3,a5 1021c: 00d7b6b3 sltu a3,a5,a3 10220: 00e58733 add a4,a1,a4 10224: 00e686b3 add a3,a3,a4 10228: 01c69613 sll a2,a3,0x1c 1022c: 0047d713 srl a4,a5,0x4 10230: 00e66733 or a4,a2,a4 10234: 0046d613 srl a2,a3,0x4 10238: 00f707b3 add a5,a4,a5 1023c: 00e7b733 sltu a4,a5,a4 10240: 00d606b3 add a3,a2,a3 10244: 00d70733 add a4,a4,a3 10248: 0f0f16b7 lui a3,0xf0f1 1024c: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d318b> 10250: 00d7f7b3 and a5,a5,a3 10254: 00d77733 and a4,a4,a3 10258: 01871613 sll a2,a4,0x18 1025c: 0087d693 srl a3,a5,0x8 10260: 00d666b3 or a3,a2,a3 10264: 00875613 srl a2,a4,0x8 10268: 00f687b3 add a5,a3,a5 1026c: 00d7b6b3 sltu a3,a5,a3 10270: 00e60733 add a4,a2,a4 10274: 00e686b3 add a3,a3,a4 10278: 01069613 sll a2,a3,0x10 1027c: 0107d713 srl a4,a5,0x10 10280: 00e66733 or a4,a2,a4 10284: 0106d613 srl a2,a3,0x10 10288: 00f707b3 add a5,a4,a5 1028c: 00e7b733 sltu a4,a5,a4 10290: 00d606b3 add a3,a2,a3 10294: 00d70733 add a4,a4,a3 10298: 00f70733 add a4,a4,a5 1029c: 07f77713 and a4,a4,127 102a0: 04000513 li a0,64 102a4: 40e50533 sub a0,a0,a4 102a8: 01051513 sll a0,a0,0x10 102ac: 01055513 srl a0,a0,0x10 102b0: 00008067 ret 000102b4 <HammingDistance>: 102b4: fe010113 add sp,sp,-32 102b8: 00112e23 sw ra,28(sp) 102bc: 00812c23 sw s0,24(sp) 102c0: 00912a23 sw s1,20(sp) 102c4: 01212823 sw s2,16(sp) 102c8: 01312623 sw s3,12(sp) 102cc: 00050493 mv s1,a0 102d0: 00058913 mv s2,a1 102d4: 00060413 mv s0,a2 102d8: 00068993 mv s3,a3 102dc: 00b6ea63 bltu a3,a1,102f0 <HammingDistance+0x3c> 102e0: 00060513 mv a0,a2 102e4: 00068593 mv a1,a3 102e8: 00d91863 bne s2,a3,102f8 <HammingDistance+0x44> 102ec: 00967663 bgeu a2,s1,102f8 <HammingDistance+0x44> 102f0: 00048513 mv a0,s1 102f4: 00090593 mv a1,s2 102f8: e4dff0ef jal 10144 <count_leading_zeros> 102fc: 04000793 li a5,64 10300: 40a787b3 sub a5,a5,a0 10304: 01079793 sll a5,a5,0x10 10308: 4107d793 sra a5,a5,0x10 1030c: 06f05063 blez a5,1036c <HammingDistance+0xb8> 10310: 00000513 li a0,0 10314: 0084c733 xor a4,s1,s0 10318: 00177713 and a4,a4,1 1031c: 00e50533 add a0,a0,a4 10320: 01f91713 sll a4,s2,0x1f 10324: 0014d493 srl s1,s1,0x1 10328: 009764b3 or s1,a4,s1 1032c: 00195913 srl s2,s2,0x1 10330: 01f99713 sll a4,s3,0x1f 10334: 00145413 srl s0,s0,0x1 10338: 00876433 or s0,a4,s0 1033c: 0019d993 srl s3,s3,0x1 10340: fff78793 add a5,a5,-1 10344: 01079793 sll a5,a5,0x10 10348: 4107d793 sra a5,a5,0x10 1034c: fc0794e3 bnez a5,10314 <HammingDistance+0x60> 10350: 01c12083 lw ra,28(sp) 10354: 01812403 lw s0,24(sp) 10358: 01412483 lw s1,20(sp) 1035c: 01012903 lw s2,16(sp) 10360: 00c12983 lw s3,12(sp) 10364: 02010113 add sp,sp,32 10368: 00008067 ret 1036c: 00000513 li a0,0 10370: fe1ff06f j 10350 <HammingDistance+0x9c> ``` ::: #### Observation * Register Usage | Register | CLZ | Hammingdistnce | |:--------:|:--------------:|:--------------:| | a-type | 6 (a0-a5) | 6(a0-a5) | | s-type | 0 | 4(s0-s3) | | t-type | 0 | 0 | | other | 0 | ra, sp | * Stack Usage | Stack | CLZ | Hammingdistance | | ----- | --- |:---------------:| | Usage | 0 | 32 | * Lines of code/`lw`/`sw` | Line of code | lw | sw | |:------------:| --- | --- | | 144 | 5 | 5 | Compare to O0 optimaztion, the number of `line of code`, `lw`, `sw` are decreased greatly. Besides, the usage of register are reduced too. ### O2 Optimized Optimize even more. ```c $ ../../build/rv32emu Hammingdist.elf Hamming Distance = 21 Hamming Distance = 63 Hamming Distance = 0 cycle count: 7435 instret: 2cb inferior exit code 0 $ riscv-none-elf-readelf -h Hammingdist.elf ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x1018a Start of program headers: 52 (bytes into file) Start of section headers: 69564 (bytes into file) Flags: 0x1, RVC, soft-float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 15 Section header string table index: 14 $ riscv-none-elf-size Hammingdist.elf text data bss dec hex filename 51794 1928 1528 55250 d7d2 Hammingdist.elf ``` :::spoiler **Disaasemble** ```c 0001020c <count_leading_zeros>: 1020c: 01f59713 sll a4,a1,0x1f 10210: 00155793 srl a5,a0,0x1 10214: 00f767b3 or a5,a4,a5 10218: 0015d713 srl a4,a1,0x1 1021c: 00b765b3 or a1,a4,a1 10220: 00a7e533 or a0,a5,a0 10224: 01e59713 sll a4,a1,0x1e 10228: 00255793 srl a5,a0,0x2 1022c: 00f767b3 or a5,a4,a5 10230: 0025d713 srl a4,a1,0x2 10234: 00b76733 or a4,a4,a1 10238: 00a7e533 or a0,a5,a0 1023c: 01c71693 sll a3,a4,0x1c 10240: 00455793 srl a5,a0,0x4 10244: 00f6e7b3 or a5,a3,a5 10248: 00475693 srl a3,a4,0x4 1024c: 00e6e6b3 or a3,a3,a4 10250: 00a7e733 or a4,a5,a0 10254: 01869613 sll a2,a3,0x18 10258: 00875793 srl a5,a4,0x8 1025c: 00f667b3 or a5,a2,a5 10260: 0086d613 srl a2,a3,0x8 10264: 00d66633 or a2,a2,a3 10268: 00e7e7b3 or a5,a5,a4 1026c: 0107d693 srl a3,a5,0x10 10270: 01061713 sll a4,a2,0x10 10274: 00d766b3 or a3,a4,a3 10278: 01065713 srl a4,a2,0x10 1027c: 00c76733 or a4,a4,a2 10280: 00f6e6b3 or a3,a3,a5 10284: 00d766b3 or a3,a4,a3 10288: 01f71593 sll a1,a4,0x1f 1028c: 0016d793 srl a5,a3,0x1 10290: 55555637 lui a2,0x55555 10294: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x555377d1> 10298: 00f5e7b3 or a5,a1,a5 1029c: 00c7f7b3 and a5,a5,a2 102a0: 00175593 srl a1,a4,0x1 102a4: 40f687b3 sub a5,a3,a5 102a8: 00c5f633 and a2,a1,a2 102ac: 00f6b6b3 sltu a3,a3,a5 102b0: 40c70733 sub a4,a4,a2 102b4: 40d70733 sub a4,a4,a3 102b8: 01e71593 sll a1,a4,0x1e 102bc: 0027d693 srl a3,a5,0x2 102c0: 33333637 lui a2,0x33333 102c4: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155af> 102c8: 00d5e6b3 or a3,a1,a3 102cc: 00c6f6b3 and a3,a3,a2 102d0: 00275593 srl a1,a4,0x2 102d4: 00c7f7b3 and a5,a5,a2 102d8: 00f687b3 add a5,a3,a5 102dc: 00c5f5b3 and a1,a1,a2 102e0: 00c77733 and a4,a4,a2 102e4: 00e58733 add a4,a1,a4 102e8: 00d7b6b3 sltu a3,a5,a3 102ec: 00e686b3 add a3,a3,a4 102f0: 01c69613 sll a2,a3,0x1c 102f4: 0047d713 srl a4,a5,0x4 102f8: 00e66733 or a4,a2,a4 102fc: 00f707b3 add a5,a4,a5 10300: 0046d613 srl a2,a3,0x4 10304: 00d60633 add a2,a2,a3 10308: 00e7b733 sltu a4,a5,a4 1030c: 0f0f16b7 lui a3,0xf0f1 10310: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d318b> 10314: 00c70733 add a4,a4,a2 10318: 00d77733 and a4,a4,a3 1031c: 00d7f7b3 and a5,a5,a3 10320: 01871613 sll a2,a4,0x18 10324: 0087d693 srl a3,a5,0x8 10328: 00d666b3 or a3,a2,a3 1032c: 00f687b3 add a5,a3,a5 10330: 00875613 srl a2,a4,0x8 10334: 00e60733 add a4,a2,a4 10338: 00d7b6b3 sltu a3,a5,a3 1033c: 00e686b3 add a3,a3,a4 10340: 01069613 sll a2,a3,0x10 10344: 0107d713 srl a4,a5,0x10 10348: 00e66733 or a4,a2,a4 1034c: 00f707b3 add a5,a4,a5 10350: 0106d613 srl a2,a3,0x10 10354: 00e7b733 sltu a4,a5,a4 10358: 00d606b3 add a3,a2,a3 1035c: 00d70733 add a4,a4,a3 10360: 00f70733 add a4,a4,a5 10364: 07f77713 and a4,a4,127 10368: 04000513 li a0,64 1036c: 40e50533 sub a0,a0,a4 10370: 01051513 sll a0,a0,0x10 10374: 01055513 srl a0,a0,0x10 10378: 00008067 ret 0001037c <HammingDistance>: 1037c: fe010113 add sp,sp,-32 10380: 00812c23 sw s0,24(sp) 10384: 00912a23 sw s1,20(sp) 10388: 01212823 sw s2,16(sp) 1038c: 01312623 sw s3,12(sp) 10390: 00112e23 sw ra,28(sp) 10394: 00060413 mv s0,a2 10398: 00068913 mv s2,a3 1039c: 00058993 mv s3,a1 103a0: 00050493 mv s1,a0 103a4: 08b6e863 bltu a3,a1,10434 <HammingDistance+0xb8> 103a8: 00060513 mv a0,a2 103ac: 00068593 mv a1,a3 103b0: 08d98063 beq s3,a3,10430 <HammingDistance+0xb4> 103b4: e59ff0ef jal 1020c <count_leading_zeros> 103b8: 04000793 li a5,64 103bc: 40a787b3 sub a5,a5,a0 103c0: 01079793 sll a5,a5,0x10 103c4: 4107d793 sra a5,a5,0x10 103c8: 00000513 li a0,0 103cc: 04f05463 blez a5,10414 <HammingDistance+0x98> 103d0: fff78793 add a5,a5,-1 103d4: 0084c733 xor a4,s1,s0 103d8: 01079693 sll a3,a5,0x10 103dc: 01f99593 sll a1,s3,0x1f 103e0: 01f91613 sll a2,s2,0x1f 103e4: 00177713 and a4,a4,1 103e8: 0014d493 srl s1,s1,0x1 103ec: 00145413 srl s0,s0,0x1 103f0: 01079793 sll a5,a5,0x10 103f4: 0106d693 srl a3,a3,0x10 103f8: 00e50533 add a0,a0,a4 103fc: 0095e4b3 or s1,a1,s1 10400: 0019d993 srl s3,s3,0x1 10404: 00866433 or s0,a2,s0 10408: 00195913 srl s2,s2,0x1 1040c: 4107d793 sra a5,a5,0x10 10410: fc0690e3 bnez a3,103d0 <HammingDistance+0x54> 10414: 01c12083 lw ra,28(sp) 10418: 01812403 lw s0,24(sp) 1041c: 01412483 lw s1,20(sp) 10420: 01012903 lw s2,16(sp) 10424: 00c12983 lw s3,12(sp) 10428: 02010113 add sp,sp,32 1042c: 00008067 ret 10430: f89672e3 bgeu a2,s1,103b4 <HammingDistance+0x38> 10434: 00048513 mv a0,s1 10438: 00098593 mv a1,s3 1043c: f79ff06f j 103b4 <HammingDistance+0x38> ``` ::: #### Observation * Register Usage | Register | CLZ | Hammingdistnce | |:--------:|:--------------:|:--------------:| | a-type | 6 (a0-a5) | 6(a0-a5) | | s-type | 0 | 4(s0-s3) | | t-type | 0 | 0 | | other | 0 | ra, sp | * Stack Usage | Stack | CLZ | Hammingdistance | | ----- | --- |:---------------:| | Usage |0 | 32 | * Lines of code/`lw`/`sw` | Line of code | lw | sw | |:------------:| --- | --- | | 144 | 5 | 5 | O2 optimaztion is similar to O1. ### O3 Optimized Optimize yet more. Turns on all optimizations specified by -O2 and also turns on some optimization flags, like `-floop-unroll-and-jam` ```c $ ../../build/rv32emu Hammingdist.elf Hamming Distance = 21 Hamming Distance = 63 Hamming Distance = 0 cycle count: 7363 instret: 2cb inferior exit code 0 $ riscv-none-elf-readelf -h Hammingdist.elf ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x10334 Start of program headers: 52 (bytes into file) Start of section headers: 70132 (bytes into file) Flags: 0x1, RVC, soft-float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 15 Section header string table index: 14 $ riscv-none-elf-size Hammingdist.elf text data bss dec hex filename 52074 1928 1528 55530 d8ea Hammingdist.elf ``` :::spoiler **Disaasemble** ```c 0001020c <count_leading_zeros>: 1020c: 01f59713 sll a4,a1,0x1f 10210: 00155793 srl a5,a0,0x1 10214: 00f767b3 or a5,a4,a5 10218: 0015d713 srl a4,a1,0x1 1021c: 00b765b3 or a1,a4,a1 10220: 00a7e533 or a0,a5,a0 10224: 01e59713 sll a4,a1,0x1e 10228: 00255793 srl a5,a0,0x2 1022c: 00f767b3 or a5,a4,a5 10230: 0025d713 srl a4,a1,0x2 10234: 00b76733 or a4,a4,a1 10238: 00a7e533 or a0,a5,a0 1023c: 01c71693 sll a3,a4,0x1c 10240: 00455793 srl a5,a0,0x4 10244: 00f6e7b3 or a5,a3,a5 10248: 00475693 srl a3,a4,0x4 1024c: 00e6e6b3 or a3,a3,a4 10250: 00a7e733 or a4,a5,a0 10254: 01869613 sll a2,a3,0x18 10258: 00875793 srl a5,a4,0x8 1025c: 00f667b3 or a5,a2,a5 10260: 0086d613 srl a2,a3,0x8 10264: 00d66633 or a2,a2,a3 10268: 00e7e7b3 or a5,a5,a4 1026c: 0107d693 srl a3,a5,0x10 10270: 01061713 sll a4,a2,0x10 10274: 00d766b3 or a3,a4,a3 10278: 01065713 srl a4,a2,0x10 1027c: 00c76733 or a4,a4,a2 10280: 00f6e6b3 or a3,a3,a5 10284: 00d766b3 or a3,a4,a3 10288: 01f71593 sll a1,a4,0x1f 1028c: 0016d793 srl a5,a3,0x1 10290: 55555637 lui a2,0x55555 10294: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x555377d1> 10298: 00f5e7b3 or a5,a1,a5 1029c: 00c7f7b3 and a5,a5,a2 102a0: 00175593 srl a1,a4,0x1 102a4: 40f687b3 sub a5,a3,a5 102a8: 00c5f633 and a2,a1,a2 102ac: 00f6b6b3 sltu a3,a3,a5 102b0: 40c70733 sub a4,a4,a2 102b4: 40d70733 sub a4,a4,a3 102b8: 01e71593 sll a1,a4,0x1e 102bc: 0027d693 srl a3,a5,0x2 102c0: 33333637 lui a2,0x33333 102c4: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155af> 102c8: 00d5e6b3 or a3,a1,a3 102cc: 00c6f6b3 and a3,a3,a2 102d0: 00275593 srl a1,a4,0x2 102d4: 00c7f7b3 and a5,a5,a2 102d8: 00f687b3 add a5,a3,a5 102dc: 00c5f5b3 and a1,a1,a2 102e0: 00c77733 and a4,a4,a2 102e4: 00e58733 add a4,a1,a4 102e8: 00d7b6b3 sltu a3,a5,a3 102ec: 00e686b3 add a3,a3,a4 102f0: 01c69613 sll a2,a3,0x1c 102f4: 0047d713 srl a4,a5,0x4 102f8: 00e66733 or a4,a2,a4 102fc: 00f707b3 add a5,a4,a5 10300: 0046d613 srl a2,a3,0x4 10304: 00d60633 add a2,a2,a3 10308: 00e7b733 sltu a4,a5,a4 1030c: 0f0f16b7 lui a3,0xf0f1 10310: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d318b> 10314: 00c70733 add a4,a4,a2 10318: 00d77733 and a4,a4,a3 1031c: 00d7f7b3 and a5,a5,a3 10320: 01871613 sll a2,a4,0x18 10324: 0087d693 srl a3,a5,0x8 10328: 00d666b3 or a3,a2,a3 1032c: 00f687b3 add a5,a3,a5 10330: 00875613 srl a2,a4,0x8 10334: 00e60733 add a4,a2,a4 10338: 00d7b6b3 sltu a3,a5,a3 1033c: 00e686b3 add a3,a3,a4 10340: 01069613 sll a2,a3,0x10 10344: 0107d713 srl a4,a5,0x10 10348: 00e66733 or a4,a2,a4 1034c: 00f707b3 add a5,a4,a5 10350: 0106d613 srl a2,a3,0x10 10354: 00e7b733 sltu a4,a5,a4 10358: 00d606b3 add a3,a2,a3 1035c: 00d70733 add a4,a4,a3 10360: 00f70733 add a4,a4,a5 10364: 07f77713 and a4,a4,127 10368: 04000513 li a0,64 1036c: 40e50533 sub a0,a0,a4 10370: 01051513 sll a0,a0,0x10 10374: 01055513 srl a0,a0,0x10 10378: 00008067 ret 0001037c <HammingDistance>: 1037c: 00050713 mv a4,a0 10380: 00060893 mv a7,a2 10384: 00068813 mv a6,a3 10388: 1ab6ee63 bltu a3,a1,10544 <HammingDistance+0x1c8> 1038c: 1ad58a63 beq a1,a3,10540 <HammingDistance+0x1c4> 10390: 01f81793 sll a5,a6,0x1f 10394: 0018d513 srl a0,a7,0x1 10398: 00a7e533 or a0,a5,a0 1039c: 00185793 srl a5,a6,0x1 103a0: 0107e7b3 or a5,a5,a6 103a4: 01156833 or a6,a0,a7 103a8: 00285513 srl a0,a6,0x2 103ac: 01e79893 sll a7,a5,0x1e 103b0: 00a8e533 or a0,a7,a0 103b4: 0027d893 srl a7,a5,0x2 103b8: 0117e7b3 or a5,a5,a7 103bc: 00a86833 or a6,a6,a0 103c0: 01c79893 sll a7,a5,0x1c 103c4: 00485513 srl a0,a6,0x4 103c8: 00a8e533 or a0,a7,a0 103cc: 0047d893 srl a7,a5,0x4 103d0: 0117e7b3 or a5,a5,a7 103d4: 00a86833 or a6,a6,a0 103d8: 01879893 sll a7,a5,0x18 103dc: 00885513 srl a0,a6,0x8 103e0: 00a8e533 or a0,a7,a0 103e4: 0087d893 srl a7,a5,0x8 103e8: 0117e7b3 or a5,a5,a7 103ec: 00a86833 or a6,a6,a0 103f0: 01079893 sll a7,a5,0x10 103f4: 01085513 srl a0,a6,0x10 103f8: 00a8e533 or a0,a7,a0 103fc: 0107d893 srl a7,a5,0x10 10400: 0117e7b3 or a5,a5,a7 10404: 00a86833 or a6,a6,a0 10408: 00f86833 or a6,a6,a5 1040c: 01f79313 sll t1,a5,0x1f 10410: 00185513 srl a0,a6,0x1 10414: 555558b7 lui a7,0x55555 10418: 55588893 add a7,a7,1365 # 55555555 <__BSS_END__+0x555377d1> 1041c: 00a36533 or a0,t1,a0 10420: 01157533 and a0,a0,a7 10424: 0017d313 srl t1,a5,0x1 10428: 40a80533 sub a0,a6,a0 1042c: 011378b3 and a7,t1,a7 10430: 00a83833 sltu a6,a6,a0 10434: 411787b3 sub a5,a5,a7 10438: 410787b3 sub a5,a5,a6 1043c: 01e79313 sll t1,a5,0x1e 10440: 00255813 srl a6,a0,0x2 10444: 333338b7 lui a7,0x33333 10448: 33388893 add a7,a7,819 # 33333333 <__BSS_END__+0x333155af> 1044c: 01036833 or a6,t1,a6 10450: 01187833 and a6,a6,a7 10454: 0027d313 srl t1,a5,0x2 10458: 01157533 and a0,a0,a7 1045c: 00a80533 add a0,a6,a0 10460: 01137333 and t1,t1,a7 10464: 0117f7b3 and a5,a5,a7 10468: 01053833 sltu a6,a0,a6 1046c: 00f307b3 add a5,t1,a5 10470: 00f80833 add a6,a6,a5 10474: 01c81893 sll a7,a6,0x1c 10478: 00455793 srl a5,a0,0x4 1047c: 00f8e7b3 or a5,a7,a5 10480: 00a78533 add a0,a5,a0 10484: 00485893 srl a7,a6,0x4 10488: 010888b3 add a7,a7,a6 1048c: 00f537b3 sltu a5,a0,a5 10490: 0f0f1837 lui a6,0xf0f1 10494: f0f80813 add a6,a6,-241 # f0f0f0f <__BSS_END__+0xf0d318b> 10498: 011787b3 add a5,a5,a7 1049c: 0107f7b3 and a5,a5,a6 104a0: 01057533 and a0,a0,a6 104a4: 01879893 sll a7,a5,0x18 104a8: 00855813 srl a6,a0,0x8 104ac: 0108e833 or a6,a7,a6 104b0: 01050833 add a6,a0,a6 104b4: 0087d893 srl a7,a5,0x8 104b8: 011787b3 add a5,a5,a7 104bc: 00a83533 sltu a0,a6,a0 104c0: 00f50533 add a0,a0,a5 104c4: 01051893 sll a7,a0,0x10 104c8: 01085793 srl a5,a6,0x10 104cc: 00f8e7b3 or a5,a7,a5 104d0: 00f807b3 add a5,a6,a5 104d4: 01055893 srl a7,a0,0x10 104d8: 0107b833 sltu a6,a5,a6 104dc: 01150533 add a0,a0,a7 104e0: 00a80833 add a6,a6,a0 104e4: 010787b3 add a5,a5,a6 104e8: 07f7f513 and a0,a5,127 104ec: 00050793 mv a5,a0 104f0: 06050063 beqz a0,10550 <HammingDistance+0x1d4> 104f4: 00000513 li a0,0 104f8: fff78793 add a5,a5,-1 104fc: 00c74833 xor a6,a4,a2 10500: 01079893 sll a7,a5,0x10 10504: 01f59e13 sll t3,a1,0x1f 10508: 01f69313 sll t1,a3,0x1f 1050c: 00187813 and a6,a6,1 10510: 00175713 srl a4,a4,0x1 10514: 00165613 srl a2,a2,0x1 10518: 01079793 sll a5,a5,0x10 1051c: 0108d893 srl a7,a7,0x10 10520: 01050533 add a0,a0,a6 10524: 00ee6733 or a4,t3,a4 10528: 0015d593 srl a1,a1,0x1 1052c: 00c36633 or a2,t1,a2 10530: 0016d693 srl a3,a3,0x1 10534: 4107d793 sra a5,a5,0x10 10538: fc0890e3 bnez a7,104f8 <HammingDistance+0x17c> 1053c: 00008067 ret 10540: e4a678e3 bgeu a2,a0,10390 <HammingDistance+0x14> 10544: 00070893 mv a7,a4 10548: 00058813 mv a6,a1 1054c: e45ff06f j 10390 <HammingDistance+0x14> 10550: 00000513 li a0,0 10554: 00008067 ret ``` ::: #### Observation * Register Usage | Register | CLZ | Hammingdistnce | |:--------:|:--------------:|:--------------:| | a-type | 6 (a0-a5) | 8(a0-a7) | | s-type | 0 | 0 | | t-type | 0 | 2(t1, t3) | | other | 0 | 0 | * Stack Usage | Stack | CLZ | Hammingdistance | | ----- | --- |:---------------:| | Usage |0 | 0 | * Lines of code/`lw`/`sw` | Line of code | lw | sw | |:------------:| --- | --- | | 214 | 0 | 0 | Compare to -O2, line of code is longer because of loop unrolling. ### Os Optimized Optimize for size ```c $ ../../build/rv32emu Hammingdist.elf Hamming Distance = 21 Hamming Distance = 63 Hamming Distance = 0 cycle count: 7400 instret: 2cb inferior exit code 0 $ riscv-none-elf-readelf -h Hammingdist.elf ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x1018a Start of program headers: 52 (bytes into file) Start of section headers: 69564 (bytes into file) Flags: 0x1, RVC, soft-float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 15 Section header string table index: 14 $ riscv-none-elf-size Hammingdist.elf text data bss dec hex filename 51786 1928 1528 55242 d7ca Hammingdist.elf ``` :::spoiler **Disassemble** ```c 0001020c <count_leading_zeros>: 1020c: 01f59713 sll a4,a1,0x1f 10210: 00155793 srl a5,a0,0x1 10214: 00f767b3 or a5,a4,a5 10218: 0015d713 srl a4,a1,0x1 1021c: 00b765b3 or a1,a4,a1 10220: 00a7e533 or a0,a5,a0 10224: 01e59713 sll a4,a1,0x1e 10228: 00255793 srl a5,a0,0x2 1022c: 00f767b3 or a5,a4,a5 10230: 0025d613 srl a2,a1,0x2 10234: 00b66633 or a2,a2,a1 10238: 00a7e533 or a0,a5,a0 1023c: 01c61713 sll a4,a2,0x1c 10240: 00455793 srl a5,a0,0x4 10244: 00f767b3 or a5,a4,a5 10248: 00465693 srl a3,a2,0x4 1024c: 00a7e733 or a4,a5,a0 10250: 00c6e6b3 or a3,a3,a2 10254: 01869613 sll a2,a3,0x18 10258: 00875793 srl a5,a4,0x8 1025c: 00f667b3 or a5,a2,a5 10260: 0086d613 srl a2,a3,0x8 10264: 00d66633 or a2,a2,a3 10268: 00e7e7b3 or a5,a5,a4 1026c: 0107d693 srl a3,a5,0x10 10270: 01061713 sll a4,a2,0x10 10274: 00d766b3 or a3,a4,a3 10278: 01065713 srl a4,a2,0x10 1027c: 00c76733 or a4,a4,a2 10280: 00f6e6b3 or a3,a3,a5 10284: 00d766b3 or a3,a4,a3 10288: 01f71613 sll a2,a4,0x1f 1028c: 0016d793 srl a5,a3,0x1 10290: 00f667b3 or a5,a2,a5 10294: 55555637 lui a2,0x55555 10298: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x555377d1> 1029c: 00175593 srl a1,a4,0x1 102a0: 00c7f7b3 and a5,a5,a2 102a4: 40f687b3 sub a5,a3,a5 102a8: 00c5f633 and a2,a1,a2 102ac: 00f6b6b3 sltu a3,a3,a5 102b0: 40c70733 sub a4,a4,a2 102b4: 40d70733 sub a4,a4,a3 102b8: 01e71613 sll a2,a4,0x1e 102bc: 0027d693 srl a3,a5,0x2 102c0: 00d666b3 or a3,a2,a3 102c4: 33333637 lui a2,0x33333 102c8: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155af> 102cc: 00c6f6b3 and a3,a3,a2 102d0: 00275593 srl a1,a4,0x2 102d4: 00c7f7b3 and a5,a5,a2 102d8: 00c5f5b3 and a1,a1,a2 102dc: 00f687b3 add a5,a3,a5 102e0: 00c77733 and a4,a4,a2 102e4: 00e58733 add a4,a1,a4 102e8: 00d7b6b3 sltu a3,a5,a3 102ec: 00e686b3 add a3,a3,a4 102f0: 01c69613 sll a2,a3,0x1c 102f4: 0047d713 srl a4,a5,0x4 102f8: 00e66733 or a4,a2,a4 102fc: 00f707b3 add a5,a4,a5 10300: 0046d613 srl a2,a3,0x4 10304: 00d606b3 add a3,a2,a3 10308: 00e7b733 sltu a4,a5,a4 1030c: 00d70733 add a4,a4,a3 10310: 0f0f16b7 lui a3,0xf0f1 10314: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d318b> 10318: 00d77733 and a4,a4,a3 1031c: 00d7f7b3 and a5,a5,a3 10320: 01871613 sll a2,a4,0x18 10324: 0087d693 srl a3,a5,0x8 10328: 00d666b3 or a3,a2,a3 1032c: 00f687b3 add a5,a3,a5 10330: 00875613 srl a2,a4,0x8 10334: 00e60733 add a4,a2,a4 10338: 00d7b6b3 sltu a3,a5,a3 1033c: 00e686b3 add a3,a3,a4 10340: 01069613 sll a2,a3,0x10 10344: 0107d713 srl a4,a5,0x10 10348: 00e66733 or a4,a2,a4 1034c: 00f707b3 add a5,a4,a5 10350: 0106d613 srl a2,a3,0x10 10354: 00e7b733 sltu a4,a5,a4 10358: 00d606b3 add a3,a2,a3 1035c: 00d70733 add a4,a4,a3 10360: 00f70733 add a4,a4,a5 10364: 07f77713 and a4,a4,127 10368: 04000513 li a0,64 1036c: 40e50533 sub a0,a0,a4 10370: 01051513 sll a0,a0,0x10 10374: 01055513 srl a0,a0,0x10 10378: 00008067 ret 0001037c <HammingDistance>: 1037c: fe010113 add sp,sp,-32 10380: 00812c23 sw s0,24(sp) 10384: 00912a23 sw s1,20(sp) 10388: 01212823 sw s2,16(sp) 1038c: 01312623 sw s3,12(sp) 10390: 00112e23 sw ra,28(sp) 10394: 00050413 mv s0,a0 10398: 00058913 mv s2,a1 1039c: 00060493 mv s1,a2 103a0: 00068993 mv s3,a3 103a4: 00b6ea63 bltu a3,a1,103b8 <HammingDistance+0x3c> 103a8: 00060513 mv a0,a2 103ac: 00068593 mv a1,a3 103b0: 00d91863 bne s2,a3,103c0 <HammingDistance+0x44> 103b4: 00867663 bgeu a2,s0,103c0 <HammingDistance+0x44> 103b8: 00040513 mv a0,s0 103bc: 00090593 mv a1,s2 103c0: e4dff0ef jal 1020c <count_leading_zeros> 103c4: 04000793 li a5,64 103c8: 40a787b3 sub a5,a5,a0 103cc: 01079793 sll a5,a5,0x10 103d0: 4107d793 sra a5,a5,0x10 103d4: 00000513 li a0,0 103d8: 02f04063 bgtz a5,103f8 <HammingDistance+0x7c> 103dc: 01c12083 lw ra,28(sp) 103e0: 01812403 lw s0,24(sp) 103e4: 01412483 lw s1,20(sp) 103e8: 01012903 lw s2,16(sp) 103ec: 00c12983 lw s3,12(sp) 103f0: 02010113 add sp,sp,32 103f4: 00008067 ret 103f8: 00944733 xor a4,s0,s1 103fc: 00177713 and a4,a4,1 10400: 00070463 beqz a4,10408 <HammingDistance+0x8c> 10404: 00150513 add a0,a0,1 10408: 01f91713 sll a4,s2,0x1f 1040c: 00145413 srl s0,s0,0x1 10410: fff78793 add a5,a5,-1 10414: 00876433 or s0,a4,s0 10418: 0014d493 srl s1,s1,0x1 1041c: 01f99713 sll a4,s3,0x1f 10420: 01079793 sll a5,a5,0x10 10424: 00195913 srl s2,s2,0x1 10428: 009764b3 or s1,a4,s1 1042c: 0019d993 srl s3,s3,0x1 10430: 4107d793 sra a5,a5,0x10 10434: fa5ff06f j 103d8 <HammingDistance+0x5c> ``` ::: #### Observation * Register Usage | Register | CLZ | Hammingdistnce | |:--------:|:--------------:|:--------------:| | a-type | 6 (a0-a5) | 6(a0-a5) | | s-type | 0 | 4(s0-s3) | | t-type | 0 | 0 | | other | 0 | ra, sp | * Stack Usage | Stack | CLZ | Hammingdistance | | ----- | --- |:---------------:| | Usage |0 | 32 | * Lines of code/`lw`/`sw` | Line of code | lw | sw | |:------------:| --- | --- | | 142 | 5 | 5 | ### `Ofast` Optimized for speed ```c $ ../../build/rv32emu Hammingdist.elf Hamming Distance = 21 Hamming Distance = 63 Hamming Distance = 0 cycle count: 7363 instret: 2cb inferior exit code 0 $ riscv-none-elf-readelf -h Hammingdist.elf ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x102ee Start of program headers: 52 (bytes into file) Start of section headers: 69660 (bytes into file) Flags: 0x1, RVC, soft-float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 15 Section header string table index: 14 $ riscv-none-elf-size Hammingdist.elf text data bss dec hex filename 52074 1928 1528 55530 d8ea Hammingdist.elf ``` :::spoiler **Disassemble** ```c 0001020c <count_leading_zeros>: 1020c: 01f59713 sll a4,a1,0x1f 10210: 00155793 srl a5,a0,0x1 10214: 00f767b3 or a5,a4,a5 10218: 0015d713 srl a4,a1,0x1 1021c: 00b765b3 or a1,a4,a1 10220: 00a7e533 or a0,a5,a0 10224: 01e59713 sll a4,a1,0x1e 10228: 00255793 srl a5,a0,0x2 1022c: 00f767b3 or a5,a4,a5 10230: 0025d713 srl a4,a1,0x2 10234: 00b76733 or a4,a4,a1 10238: 00a7e533 or a0,a5,a0 1023c: 01c71693 sll a3,a4,0x1c 10240: 00455793 srl a5,a0,0x4 10244: 00f6e7b3 or a5,a3,a5 10248: 00475693 srl a3,a4,0x4 1024c: 00e6e6b3 or a3,a3,a4 10250: 00a7e733 or a4,a5,a0 10254: 01869613 sll a2,a3,0x18 10258: 00875793 srl a5,a4,0x8 1025c: 00f667b3 or a5,a2,a5 10260: 0086d613 srl a2,a3,0x8 10264: 00d66633 or a2,a2,a3 10268: 00e7e7b3 or a5,a5,a4 1026c: 0107d693 srl a3,a5,0x10 10270: 01061713 sll a4,a2,0x10 10274: 00d766b3 or a3,a4,a3 10278: 01065713 srl a4,a2,0x10 1027c: 00c76733 or a4,a4,a2 10280: 00f6e6b3 or a3,a3,a5 10284: 00d766b3 or a3,a4,a3 10288: 01f71593 sll a1,a4,0x1f 1028c: 0016d793 srl a5,a3,0x1 10290: 55555637 lui a2,0x55555 10294: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x555377d1> 10298: 00f5e7b3 or a5,a1,a5 1029c: 00c7f7b3 and a5,a5,a2 102a0: 00175593 srl a1,a4,0x1 102a4: 40f687b3 sub a5,a3,a5 102a8: 00c5f633 and a2,a1,a2 102ac: 00f6b6b3 sltu a3,a3,a5 102b0: 40c70733 sub a4,a4,a2 102b4: 40d70733 sub a4,a4,a3 102b8: 01e71593 sll a1,a4,0x1e 102bc: 0027d693 srl a3,a5,0x2 102c0: 33333637 lui a2,0x33333 102c4: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155af> 102c8: 00d5e6b3 or a3,a1,a3 102cc: 00c6f6b3 and a3,a3,a2 102d0: 00275593 srl a1,a4,0x2 102d4: 00c7f7b3 and a5,a5,a2 102d8: 00f687b3 add a5,a3,a5 102dc: 00c5f5b3 and a1,a1,a2 102e0: 00c77733 and a4,a4,a2 102e4: 00e58733 add a4,a1,a4 102e8: 00d7b6b3 sltu a3,a5,a3 102ec: 00e686b3 add a3,a3,a4 102f0: 01c69613 sll a2,a3,0x1c 102f4: 0047d713 srl a4,a5,0x4 102f8: 00e66733 or a4,a2,a4 102fc: 00f707b3 add a5,a4,a5 10300: 0046d613 srl a2,a3,0x4 10304: 00d60633 add a2,a2,a3 10308: 00e7b733 sltu a4,a5,a4 1030c: 0f0f16b7 lui a3,0xf0f1 10310: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d318b> 10314: 00c70733 add a4,a4,a2 10318: 00d77733 and a4,a4,a3 1031c: 00d7f7b3 and a5,a5,a3 10320: 01871613 sll a2,a4,0x18 10324: 0087d693 srl a3,a5,0x8 10328: 00d666b3 or a3,a2,a3 1032c: 00f687b3 add a5,a3,a5 10330: 00875613 srl a2,a4,0x8 10334: 00e60733 add a4,a2,a4 10338: 00d7b6b3 sltu a3,a5,a3 1033c: 00e686b3 add a3,a3,a4 10340: 01069613 sll a2,a3,0x10 10344: 0107d713 srl a4,a5,0x10 10348: 00e66733 or a4,a2,a4 1034c: 00f707b3 add a5,a4,a5 10350: 0106d613 srl a2,a3,0x10 10354: 00e7b733 sltu a4,a5,a4 10358: 00d606b3 add a3,a2,a3 1035c: 00d70733 add a4,a4,a3 10360: 00f70733 add a4,a4,a5 10364: 07f77713 and a4,a4,127 10368: 04000513 li a0,64 1036c: 40e50533 sub a0,a0,a4 10370: 01051513 sll a0,a0,0x10 10374: 01055513 srl a0,a0,0x10 10378: 00008067 ret 0001037c <HammingDistance>: 1037c: 00050713 mv a4,a0 10380: 00060893 mv a7,a2 10384: 00068813 mv a6,a3 10388: 1ab6ee63 bltu a3,a1,10544 <HammingDistance+0x1c8> 1038c: 1ad58a63 beq a1,a3,10540 <HammingDistance+0x1c4> 10390: 01f81793 sll a5,a6,0x1f 10394: 0018d513 srl a0,a7,0x1 10398: 00a7e533 or a0,a5,a0 1039c: 00185793 srl a5,a6,0x1 103a0: 0107e7b3 or a5,a5,a6 103a4: 01156833 or a6,a0,a7 103a8: 00285513 srl a0,a6,0x2 103ac: 01e79893 sll a7,a5,0x1e 103b0: 00a8e533 or a0,a7,a0 103b4: 0027d893 srl a7,a5,0x2 103b8: 0117e7b3 or a5,a5,a7 103bc: 00a86833 or a6,a6,a0 103c0: 01c79893 sll a7,a5,0x1c 103c4: 00485513 srl a0,a6,0x4 103c8: 00a8e533 or a0,a7,a0 103cc: 0047d893 srl a7,a5,0x4 103d0: 0117e7b3 or a5,a5,a7 103d4: 00a86833 or a6,a6,a0 103d8: 01879893 sll a7,a5,0x18 103dc: 00885513 srl a0,a6,0x8 103e0: 00a8e533 or a0,a7,a0 103e4: 0087d893 srl a7,a5,0x8 103e8: 0117e7b3 or a5,a5,a7 103ec: 00a86833 or a6,a6,a0 103f0: 01079893 sll a7,a5,0x10 103f4: 01085513 srl a0,a6,0x10 103f8: 00a8e533 or a0,a7,a0 103fc: 0107d893 srl a7,a5,0x10 10400: 0117e7b3 or a5,a5,a7 10404: 00a86833 or a6,a6,a0 10408: 00f86833 or a6,a6,a5 1040c: 01f79313 sll t1,a5,0x1f 10410: 00185513 srl a0,a6,0x1 10414: 555558b7 lui a7,0x55555 10418: 55588893 add a7,a7,1365 # 55555555 <__BSS_END__+0x555377d1> 1041c: 00a36533 or a0,t1,a0 10420: 01157533 and a0,a0,a7 10424: 0017d313 srl t1,a5,0x1 10428: 40a80533 sub a0,a6,a0 1042c: 011378b3 and a7,t1,a7 10430: 00a83833 sltu a6,a6,a0 10434: 411787b3 sub a5,a5,a7 10438: 410787b3 sub a5,a5,a6 1043c: 01e79313 sll t1,a5,0x1e 10440: 00255813 srl a6,a0,0x2 10444: 333338b7 lui a7,0x33333 10448: 33388893 add a7,a7,819 # 33333333 <__BSS_END__+0x333155af> 1044c: 01036833 or a6,t1,a6 10450: 01187833 and a6,a6,a7 10454: 0027d313 srl t1,a5,0x2 10458: 01157533 and a0,a0,a7 1045c: 00a80533 add a0,a6,a0 10460: 01137333 and t1,t1,a7 10464: 0117f7b3 and a5,a5,a7 10468: 01053833 sltu a6,a0,a6 1046c: 00f307b3 add a5,t1,a5 10470: 00f80833 add a6,a6,a5 10474: 01c81893 sll a7,a6,0x1c 10478: 00455793 srl a5,a0,0x4 1047c: 00f8e7b3 or a5,a7,a5 10480: 00a78533 add a0,a5,a0 10484: 00485893 srl a7,a6,0x4 10488: 010888b3 add a7,a7,a6 1048c: 00f537b3 sltu a5,a0,a5 10490: 0f0f1837 lui a6,0xf0f1 10494: f0f80813 add a6,a6,-241 # f0f0f0f <__BSS_END__+0xf0d318b> 10498: 011787b3 add a5,a5,a7 1049c: 0107f7b3 and a5,a5,a6 104a0: 01057533 and a0,a0,a6 104a4: 01879893 sll a7,a5,0x18 104a8: 00855813 srl a6,a0,0x8 104ac: 0108e833 or a6,a7,a6 104b0: 01050833 add a6,a0,a6 104b4: 0087d893 srl a7,a5,0x8 104b8: 011787b3 add a5,a5,a7 104bc: 00a83533 sltu a0,a6,a0 104c0: 00f50533 add a0,a0,a5 104c4: 01051893 sll a7,a0,0x10 104c8: 01085793 srl a5,a6,0x10 104cc: 00f8e7b3 or a5,a7,a5 104d0: 00f807b3 add a5,a6,a5 104d4: 01055893 srl a7,a0,0x10 104d8: 0107b833 sltu a6,a5,a6 104dc: 01150533 add a0,a0,a7 104e0: 00a80833 add a6,a6,a0 104e4: 010787b3 add a5,a5,a6 104e8: 07f7f513 and a0,a5,127 104ec: 00050793 mv a5,a0 104f0: 06050063 beqz a0,10550 <HammingDistance+0x1d4> 104f4: 00000513 li a0,0 104f8: fff78793 add a5,a5,-1 104fc: 00c74833 xor a6,a4,a2 10500: 01079893 sll a7,a5,0x10 10504: 01f59e13 sll t3,a1,0x1f 10508: 01f69313 sll t1,a3,0x1f 1050c: 00187813 and a6,a6,1 10510: 00175713 srl a4,a4,0x1 10514: 00165613 srl a2,a2,0x1 10518: 01079793 sll a5,a5,0x10 1051c: 0108d893 srl a7,a7,0x10 10520: 01050533 add a0,a0,a6 10524: 00ee6733 or a4,t3,a4 10528: 0015d593 srl a1,a1,0x1 1052c: 00c36633 or a2,t1,a2 10530: 0016d693 srl a3,a3,0x1 10534: 4107d793 sra a5,a5,0x10 10538: fc0890e3 bnez a7,104f8 <HammingDistance+0x17c> 1053c: 00008067 ret 10540: e4a678e3 bgeu a2,a0,10390 <HammingDistance+0x14> 10544: 00070893 mv a7,a4 10548: 00058813 mv a6,a1 1054c: e45ff06f j 10390 <HammingDistance+0x14> 10550: 00000513 li a0,0 10554: 00008067 ret ``` ::: #### Observation * Register Usage | Register | CLZ | Hammingdistnce | |:--------:|:--------------:|:--------------:| | a-type | 6 (a0-a5) | 8(a0-a7) | | s-type | 0 | 4(s0-s3) | | t-type | 0 | 2(t1, t3) | | other | 0 | 0 | * Stack Usage | Stack | CLZ | Hammingdistance | | ----- | --- |:---------------:| | Usage |0 | 0 | * Lines of code/`lw`/`sw` | Line of code | lw | sw | |:------------:| --- | --- | | 214 | 0 | 0 | ### Conclusion | Optimaztion level | O0 | O1 | O2 | O3 | Os | Ofast | |:-----------------:| ----- | ----- | ----- | ----- | ----- | ----- | | cycle counts | 11190 | 7198 | 7435 | 7363 | 7400 | 7363 | | dec(from header) | 56362 | 55246 | 55250 | 55530 | 55242 | 55530 | * **From -O0 to -O1** : Not only the usage `lw` instructions but also the registers are decreased significantly.Besides, the lines of code has been reduced by over half. * **From -O1 to -O2** : The results are nearly identical, but it may be due to some flags not being suitable for the code, resulting in an increase in the number of cycles. * **From -O2 to -O3 and O-fast** : Optimization has reached its limit. Utilizing the `t-type` register instead of the `s-type` register, and both stack usage and `lw`/`sw` instructions are zero. * **From -O1 to -Os** : -Os is Optimize for size. `dec` is the summition of all three segments, `dec = .text + .data + .bss`.We can see that -Os optimization has the least amount of `dec`. ## Handwritten Assembly Code ### Original assembly code from [author_YP](https://hackmd.io/@O6C2C3zQQBanDM55QRZ7DQ) :::spoiler Original assembly code ```c .data test_data_1: .dword 0x0000000000100000, 0x00000000000FFFFF # HD(1048576, 1048575) = 21 test_data_2: .dword 0x0000000000000001, 0x7FFFFFFFFFFFFFFE # HD(1, 9223372036854775806) = 63 test_data_3: .dword 0x000000028370228F, 0x000000028370228F # HD(10795098767, 10795098767) = 0 msg_string: .string "\nHamming Distance=" .text main: addi sp, sp, -12 # push pointers of test data onto the stack la t0, test_data_1 sw t0, 0(sp) la t0, test_data_2 sw t0, 4(sp) la t0, test_data_3 sw t0, 8(sp) # initialize main_loop addi s0, zero, 3 # s0 : number of test case addi s1, zero, 0 # s1 : test case counter addi s2, sp, 0 # s2 : points to test_data_1 main_loop: la a0, msg_string li a7, 4 # print string ecall lw a0, 0(s2) # a0 : pointer to the first data in test_data_1 addi a1, a0, 8 # a1 : pointer to the second data in test_data_1 jal ra, hd_func # print the result # li a7, 1 # print integer ecall # print result of hd_cal (which is in a0) addi s2, s2, 4 # s2 : points to next test_data addi s1, s1, 1 # counter++ bne s1, s0, main_loop addi sp, sp, 12 li a7, 10 ecall # hamming distance function hd_func: addi sp, sp, -36 sw ra, 0(sp) sw s0, 4(sp) # address of x0 sw s1, 8(sp) # address of x1 sw s2, 12(sp) # digit of x0 sw s3, 16(sp) # digit of x1 sw s4, 20(sp) # lower part of x0 sw s5, 24(sp) # higher part of x0 sw s6, 28(sp) # lower part of x1 sw s7, 32(sp) # higher part of x1 # get address of x0 and x1 mv s0, a0 # s0 : address of x0 mv s1, a1 # s1 : address of x1 # get x0_digit lw a0, 0(s0) # a0 : lower part of x0 lw a1, 4(s0) # a1 : higher part of x0 jal ra clz li s2, 64 sub s2, s2, a0 # s2 : x0_digit (return value saved in a0) # get x1_digit lw a0, 0(s1) # a0 : lower part of x1 lw a1, 4(s1) # a1 : higher part of x1 jal ra clz li s3, 64 sub s3, s3, a0 # s3 : x1_digit (return value saved in a0) # get x0(s5 s4) and x1(s7 s6) lw s4, 0(s0) lw s5, 4(s0) lw s6, 0(s1) lw s7, 4(s1) # compare with two digit slt t0, s2, s3 bne t0, zero, x1_larger mv s3, zero # s3: hd counter bgt s2, zero, hd_cal_loop # when digit is 0 mv a0, s2 # save max_digit to a0 j hd_func_end x1_larger: mv s2, s3 # s2 : max_digit mv s3, zero # s3: hd counter bgt s2, zero, hd_cal_loop # when digit is 0 mv a0, s2 # save max_digit to a0 j hd_func_end hd_func_end: lw ra, 0(sp) lw s0, 4(sp) lw s1, 8(sp) lw s2, 12(sp) lw s3, 16(sp) lw s4, 20(sp) lw s5, 24(sp) lw s6, 28(sp) lw s7, 32(sp) addi sp, sp, 36 ret # hamming distance calculation (result save in a0, a1) hd_cal_loop: # when the current digit larger than 32 addi t2, zero, 32 bgt s2, t2, hd_getLSB_upper # hd_getLSB_lower : and with 1 li t3, 0x00000001 and t4, s4, t3 and t5, s6, t3 j hd_cal_shift hd_getLSB_upper: # and with 1 li t3, 0x00000001 and t4, s5, t3 and t5, s7, t3 hd_cal_shift: # (s5 s4) = x >> 1 srli t0, s4, 1 slli t1, s5, 31 or s4, t0, t1 # s4 >> 1 srli s5, s5, 1 # s5 >> 1 # (s7 s6) = x >> 1 srli t0, s6, 1 slli t1, s7, 31 or s6, t0, t1 # s6 >> 1 srli s7, s7, 1 # s7 >> 1 beq t4, t5, hd_check_loop addi s3, s3, 1 hd_check_loop: addi s2, s2, -1 bne s2, zero, hd_cal_loop mv a0, s3 # save return value to a0 j hd_func_end # count leading zeros clz: addi sp, sp, -4 sw ra, 0(sp) beq a1, zero, clz_lower_set_one clz_upper_set_one: srli t1, a1, 1 or a1, a1, t1 srli t1, a1, 2 or a1, a1, t1 srli t1, a1, 4 or a1, a1, t1 srli t1, a1, 8 or a1, a1, t1 srli t1, a1, 16 or a1, a1, t1 li a0, 0xffffffff j clz_count_ones clz_lower_set_one: srli t0, a0, 1 or a0, a0, t0 srli t0, a0, 2 or a0, a0, t0 srli t0, a0, 4 or a0, a0, t0 srli t0, a0, 8 or a0, a0, t0 srli t0, a0, 16 or a0, a0, t0 clz_count_ones: # x = (a1 a0) # x -= ((x >> 1) & 0x5555555555555555); # srli t0, a0, 1 slli t1, a1, 31 or t0, t0, t1 # t0 >> 1 srli t1, a1, 1 # t1 >> 1 li t2, 0x55555555 and t0, t0, t2 and t1, t1, t2 sltu t3, a0, t0 # t3 : borrow bit sub a0, a0, t0 sub a1, a1, t1 sub a1, a1, t3 # x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333); # srli t0, a0, 2 slli t1, a1, 30 or t0, t0, t1 # t0 >> 2 srli t1, a1, 2 # t1 >> 2 li t2, 0x33333333 and t0, t0, t2 and t1, t1, t2 and t4, a0, t2 and t5, a1, t2 # (a1 a0) = (t1 t0) + (t5 t4) add a0, t0, t4 sltu t3, a0, t0 # t3 : carry bit add a1, t1, t5 add a1, a1, t3 # x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f; # srli t0, a0, 4 slli t1, a1, 28 or t0, t0, t1 # t0 >> 4 srli t1, a1, 4 # t1 >> 4 add t0, t0, a0 sltu t3, t0, a0 # t3 : carry bit add t1, t1, a1 add t1, t1, t3 li t2, 0x0f0f0f0f and a0, t0, t2 and a1, t1, t2 # x += (x >> 8); # srli t0, a0, 8 slli t1, a1, 24 or t0, t0, t1 # t0 >> 8 srli t1, a1, 8 # t1 >> 8 add a0, a0, t0 sltu t3, a0, t0 # t3 : carry bit add a1, a1, t1 add a1, a1, t3 # (a1 a0) += (t1 t0) # x += (x >> 16); # srli t0, a0, 16 slli t1, a1, 16 or t0, t0, t1 # t0 >> 16 srli t1, a1, 16 # t1 >> 16 add a0, a0, t0 sltu t3, a0, t0 # t3 : carry bit add a1, a1, t1 add a1, a1, t3 # (a1 a0) += (t1 t0) # x += (x >> 32); # # (t1 t0) = x >> 32 mv t0, a1 mv t1, zero add a0, a0, t0 sltu t3, a0, t0 # t3 : carry bit add a1, a1, t1 add a1, a1, t3 # (a1 a0) += (t1 t0) # return (64 - (x & 0x7f)); # a0 = (x & 0x7f) andi a0, a0, 0x7f li t0, 64 sub a0, t0, a0 # a0 = (64 - (x & 0x7f)) lw ra, 0(sp) addi sp, sp, 4 ret ``` ::: ### Modify assembly code We need to make slight modifications to the assembly code to make it compliant with the rv32emu specifications.(Please check [docs/syscall.md](https://github.com/sysprog21/rv32emu/blob/master/docs/syscall.md)) 1. **System call** If we want print string or exit in Ripes, we may use: ```c1! .data msg_string: .string "Hello world !" main: la a0, msg_string li a7, 4 # print string ecall li a7, 10 # exit ecall ``` But in rv32emu, we should modify it to: ```c1! .set STDOUT, 1 .set WRITE, 64 .set EXIT, 93 .section .rodata msg: .ascii "Hello world !" _start: li a0, STDOUT # file descriptor, 1 = stnadard output la a1, msg # address of string li a2, 7 # length of string li a7, WRITE # syscall number for write ecall # MISSING: Check for error condition li a0, 0 # 0 signals success li a7, EXIT ecall ``` 2. **Entry label** In rv32emu, we can determine where to start the program through .ld file. So we need to modify the entry label of assembly code according to .ld file. Our ld. file is shown as : ```c1! OUTPUT_ARCH("riscv") ENTRY(_start1) SECTIONS { . = 0x0; } ``` It means that _start1 is entry label, so we need to modify our assembly code as following : ```c1! .set EXIT, 93 _start1: #instead of main addi t0, t0, 1 Exit: li a0, 0 # 0 signals success li a7, EXIT ecall ``` If we didn't obey this rule, you may get the error message when you compiling your file : ```c1! riscv-none-elf-ld: warning: cannot find entry symbol _start; defaulting to 00000000 ``` ### Additional adjustments 1. `get_cycles` : In order to count the cycles in the program, I add the `get_cycles` which is shown above. 2. `syscall_printint`: Because rv32emu doesn't support printing integer, I edit system call and add an `printint` syscall. ```diff #define SUPPORTED_SYSCALLS \ + _(printint, 11) \ ... _(submit_queue, 0xFEED) \ _(setup_audio, 0xBABE) \ _(control_audio, 0xD00D), \ ) + static void syscall_printint(riscv_t *rv) + { + uint32_t buffer = rv_get_reg(rv, rv_reg_a1); + fprintf(stdout, " %d ", buffer); + } ... ``` ### Assembly Code(Modified) :::spoiler Assembly Code(Modified) ```c .global _start1 #newlib system calls .set SYSEXIT, 93 .set SYSWRITE, 64 .set SYSINT, 11 .data test_data_1: .dword 0x0000000000100000, 0x00000000000FFFFF # HD(1048576, 1048575) = 21 test_data_2: .dword 0x0000000000000001, 0x7FFFFFFFFFFFFFFE # HD(1, 9223372036854775806) = 63 test_data_3: .dword 0x000000028370228F, 0x000000028370228F # HD(10795098767, 10795098767) = 0 .section .rodata msg1: .ascii "\nHamming Distance=" msg2: .ascii "\nCycle counts :" .text _start1: addi sp, sp, -12 # push pointers of test data onto the stack la t0, test_data_1 sw t0, 0(sp) la t0, test_data_2 sw t0, 4(sp) la t0, test_data_3 sw t0, 8(sp) # initialize main_loop addi s0, zero, 3 # s0 : number of test case addi s1, zero, 0 # s1 : test case counter addi s2, sp, 0 # s2 : points to test_data_1 get_cycles_init: csrr t1, cycleh csrr s10, cycle csrr t2, cycleh bne t1, t2, get_cycles_init main_loop: li a7, SYSWRITE #"write" syscall li a0, 1 #1 = standard output (stdout) la a1, msg1 # load address of msg string li a2, 18 # length of msg string ecall lw a0, 0(s2) # a0 : pointer to the first data in test_data_1 addi a1, a0, 8 # a1 : pointer to the second data in test_data_1 jal ra, hd_func # print the result # li a7, SYSINT #"printint" syscall add a1, a0, x0 # address of string(move result of hd_cal to a1) li a0, 1 #1 = standard output (stdout) ecall # print result of hd_cal addi s2, s2, 4 # s2 : points to next test_data addi s1, s1, 1 # counter++ bne s1, s0, main_loop get_cycles_end: csrr t1, cycleh csrr s11, cycle csrr t2, cycleh bne t1, t2, get_cycles_end sub s11, s11, s10 # print the result # li a7, SYSWRITE #"write" syscall li a0, 1 #1 = standard output (stdout) la a1, msg2 # load address of msg string li a2, 15 # length of msg string ecall li a7, SYSINT #"printint" syscall add a1, s11, x0 # address of string(move result of hd_cal to a1) li a0, 1 #1 = standard output (stdout) ecall # print result of get_cycles_end addi sp, sp, 12 li a0, 0 #0 signals success li a7, SYSEXIT # "exit" syscall ecall # hamming distance function hd_func: addi sp, sp, -36 sw ra, 0(sp) sw s0, 4(sp) # address of x0 sw s1, 8(sp) # address of x1 sw s2, 12(sp) # digit of x0 sw s3, 16(sp) # digit of x1 sw s4, 20(sp) # lower part of x0 sw s5, 24(sp) # higher part of x0 sw s6, 28(sp) # lower part of x1 sw s7, 32(sp) # higher part of x1 # get address of x0 and x1 mv s0, a0 # s0 : address of x0 mv s1, a1 # s1 : address of x1 # get x0_digit lw a0, 0(s0) # a0 : lower part of x0 lw a1, 4(s0) # a1 : higher part of x0 jal ra, clz li s2, 64 sub s2, s2, a0 # s2 : x0_digit (return value saved in a0) # get x1_digit lw a0, 0(s1) # a0 : lower part of x1 lw a1, 4(s1) # a1 : higher part of x1 jal ra, clz li s3, 64 sub s3, s3, a0 # s3 : x1_digit (return value saved in a0) # get x0(s5 s4) and x1(s7 s6) lw s4, 0(s0) lw s5, 4(s0) lw s6, 0(s1) lw s7, 4(s1) # compare with two digit slt t0, s2, s3 bne t0, zero, x1_larger mv s3, zero # s3: hd counter bgt s2, zero, hd_cal_loop # when digit is 0 mv a0, s2 # save max_digit to a0 j hd_func_end x1_larger: mv s2, s3 # s2 : max_digit mv s3, zero # s3: hd counter bgt s2, zero, hd_cal_loop # when digit is 0 mv a0, s2 # save max_digit to a0 j hd_func_end hd_func_end: lw ra, 0(sp) lw s0, 4(sp) lw s1, 8(sp) lw s2, 12(sp) lw s3, 16(sp) lw s4, 20(sp) lw s5, 24(sp) lw s6, 28(sp) lw s7, 32(sp) addi sp, sp, 36 ret # hamming distance calculation (result save in a0, a1) hd_cal_loop: # when the current digit larger than 32 addi t2, zero, 32 bgt s2, t2, hd_getLSB_upper # hd_getLSB_lower : and with 1 li t3, 0x00000001 and t4, s4, t3 and t5, s6, t3 j hd_cal_shift hd_getLSB_upper: # and with 1 li t3, 0x00000001 and t4, s5, t3 and t5, s7, t3 hd_cal_shift: # (s5 s4) = x >> 1 srli t0, s4, 1 slli t1, s5, 31 or s4, t0, t1 # s4 >> 1 srli s5, s5, 1 # s5 >> 1 # (s7 s6) = x >> 1 srli t0, s6, 1 slli t1, s7, 31 or s6, t0, t1 # s6 >> 1 srli s7, s7, 1 # s7 >> 1 beq t4, t5, hd_check_loop addi s3, s3, 1 hd_check_loop: addi s2, s2, -1 bne s2, zero, hd_cal_loop mv a0, s3 # save return value to a0 j hd_func_end # count leading zeros clz: addi sp, sp, -4 sw ra, 0(sp) beq a1, zero, clz_lower_set_one clz_upper_set_one: srli t1, a1, 1 or a1, a1, t1 srli t1, a1, 2 or a1, a1, t1 srli t1, a1, 4 or a1, a1, t1 srli t1, a1, 8 or a1, a1, t1 srli t1, a1, 16 or a1, a1, t1 li a0, 0xffffffff j clz_count_ones clz_lower_set_one: srli t0, a0, 1 or a0, a0, t0 srli t0, a0, 2 or a0, a0, t0 srli t0, a0, 4 or a0, a0, t0 srli t0, a0, 8 or a0, a0, t0 srli t0, a0, 16 or a0, a0, t0 clz_count_ones: # x = (a1 a0) # x -= ((x >> 1) & 0x5555555555555555); # srli t0, a0, 1 slli t1, a1, 31 or t0, t0, t1 # t0 >> 1 srli t1, a1, 1 # t1 >> 1 li t2, 0x55555555 and t0, t0, t2 and t1, t1, t2 sltu t3, a0, t0 # t3 : borrow bit sub a0, a0, t0 sub a1, a1, t1 sub a1, a1, t3 # x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333); # srli t0, a0, 2 slli t1, a1, 30 or t0, t0, t1 # t0 >> 2 srli t1, a1, 2 # t1 >> 2 li t2, 0x33333333 and t0, t0, t2 and t1, t1, t2 and t4, a0, t2 and t5, a1, t2 # (a1 a0) = (t1 t0) + (t5 t4) add a0, t0, t4 sltu t3, a0, t0 # t3 : carry bit add a1, t1, t5 add a1, a1, t3 # x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f; # srli t0, a0, 4 slli t1, a1, 28 or t0, t0, t1 # t0 >> 4 srli t1, a1, 4 # t1 >> 4 add t0, t0, a0 sltu t3, t0, a0 # t3 : carry bit add t1, t1, a1 add t1, t1, t3 li t2, 0x0f0f0f0f and a0, t0, t2 and a1, t1, t2 # x += (x >> 8); # srli t0, a0, 8 slli t1, a1, 24 or t0, t0, t1 # t0 >> 8 srli t1, a1, 8 # t1 >> 8 add a0, a0, t0 sltu t3, a0, t0 # t3 : carry bit add a1, a1, t1 add a1, a1, t3 # (a1 a0) += (t1 t0) # x += (x >> 16); # srli t0, a0, 16 slli t1, a1, 16 or t0, t0, t1 # t0 >> 16 srli t1, a1, 16 # t1 >> 16 add a0, a0, t0 sltu t3, a0, t0 # t3 : carry bit add a1, a1, t1 add a1, a1, t3 # (a1 a0) += (t1 t0) # x += (x >> 32); # # (t1 t0) = x >> 32 mv t0, a1 mv t1, zero add a0, a0, t0 sltu t3, a0, t0 # t3 : carry bit add a1, a1, t1 add a1, a1, t3 # (a1 a0) += (t1 t0) # return (64 - (x & 0x7f)); # a0 = (x & 0x7f) andi a0, a0, 0x7f li t0, 64 sub a0, t0, a0 # a0 = (64 - (x & 0x7f)) lw ra, 0(sp) addi sp, sp, 4 ret ``` ::: ### Makefile To facilitate compilation, we can write a Makefile: ```c .PHONY: clean include ../../mk/toolchain.mk ASFLAGS = -march=rv32i_zicsr -mabi=ilp32 LDFLAGS = --oformat=elf32-littleriscv BIN = Hammingdist.elf %.o: %.S $(CROSS_COMPILE)as -R $(ASFLAGS) -c -o $@ $< all: $(BIN) Hammingdist.elf: Hammingdist.o $(CROSS_COMPILE)ld -o $@ -T Hammingdist.ld $(LDFLAGS) $< clean: $(RM) $(BIN) Hammingdist.o ``` * Because we use the `CSR` command, we need to add `_zicsr` in `ASFLAGS` additionally. ### Output ```c $ ../../build/rv32emu Hammingdist.elf Hamming Distance= 21 Hamming Distance= 63 Hamming Distance= 0 Cycle counts : 2729 inferior exit code 0 $ riscv-none-elf-readelf -h Hammingdist.elf ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x0 Start of program headers: 52 (bytes into file) Start of section headers: 5900 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 2 Size of section headers: 40 (bytes) Number of section headers: 7 Section header string table index: 6 $ riscv-none-elf-size Hammingdist.elf text data bss dec hex filename 929 0 0 929 3a1 Hammingdist.elf ``` :::spoiler **Disassemble** ```c 000000c8 <hd_func>: c8: fdc10113 add sp,sp,-36 cc: 00112023 sw ra,0(sp) d0: 00812223 sw s0,4(sp) d4: 00912423 sw s1,8(sp) d8: 01212623 sw s2,12(sp) dc: 01312823 sw s3,16(sp) e0: 01412a23 sw s4,20(sp) e4: 01512c23 sw s5,24(sp) e8: 01612e23 sw s6,28(sp) ec: 03712023 sw s7,32(sp) f0: 00050413 mv s0,a0 f4: 00058493 mv s1,a1 f8: 00042503 lw a0,0(s0) fc: 00442583 lw a1,4(s0) 100: 0e4000ef jal 1e4 <clz> 104: 04000913 li s2,64 108: 40a90933 sub s2,s2,a0 10c: 0004a503 lw a0,0(s1) 110: 0044a583 lw a1,4(s1) 114: 0d0000ef jal 1e4 <clz> 118: 04000993 li s3,64 11c: 40a989b3 sub s3,s3,a0 120: 00042a03 lw s4,0(s0) 124: 00442a83 lw s5,4(s0) 128: 0004ab03 lw s6,0(s1) 12c: 0044ab83 lw s7,4(s1) 130: 013922b3 slt t0,s2,s3 134: 00029a63 bnez t0,148 <x1_larger> 138: 00000993 li s3,0 13c: 05204663 bgtz s2,188 <hd_cal_loop> 140: 00090513 mv a0,s2 144: 0180006f j 15c <hd_func_end> 00000148 <x1_larger>: 148: 00098913 mv s2,s3 14c: 00000993 li s3,0 150: 03204c63 bgtz s2,188 <hd_cal_loop> 154: 00090513 mv a0,s2 158: 0040006f j 15c <hd_func_end> 0000015c <hd_func_end>: 15c: 00012083 lw ra,0(sp) 160: 00412403 lw s0,4(sp) 164: 00812483 lw s1,8(sp) 168: 00c12903 lw s2,12(sp) 16c: 01012983 lw s3,16(sp) 170: 01412a03 lw s4,20(sp) 174: 01812a83 lw s5,24(sp) 178: 01c12b03 lw s6,28(sp) 17c: 02012b83 lw s7,32(sp) 180: 02410113 add sp,sp,36 184: 00008067 ret 00000188 <hd_cal_loop>: 188: 02000393 li t2,32 18c: 0123ca63 blt t2,s2,1a0 <hd_getLSB_upper> 190: 00100e13 li t3,1 194: 01ca7eb3 and t4,s4,t3 198: 01cb7f33 and t5,s6,t3 19c: 0100006f j 1ac <hd_cal_shift> 000001a0 <hd_getLSB_upper>: 1a0: 00100e13 li t3,1 1a4: 01cafeb3 and t4,s5,t3 1a8: 01cbff33 and t5,s7,t3 000001ac <hd_cal_shift>: 1ac: 001a5293 srl t0,s4,0x1 1b0: 01fa9313 sll t1,s5,0x1f 1b4: 0062ea33 or s4,t0,t1 1b8: 001ada93 srl s5,s5,0x1 1bc: 001b5293 srl t0,s6,0x1 1c0: 01fb9313 sll t1,s7,0x1f 1c4: 0062eb33 or s6,t0,t1 1c8: 001bdb93 srl s7,s7,0x1 1cc: 01ee8463 beq t4,t5,1d4 <hd_check_loop> 1d0: 00198993 add s3,s3,1 000001d4 <hd_check_loop>: 1d4: fff90913 add s2,s2,-1 1d8: fa0918e3 bnez s2,188 <hd_cal_loop> 1dc: 00098513 mv a0,s3 1e0: f7dff06f j 15c <hd_func_end> 000001e4 <clz>: 1e4: ffc10113 add sp,sp,-4 1e8: 00112023 sw ra,0(sp) 1ec: 02058a63 beqz a1,220 <clz_lower_set_one> 000001f0 <clz_upper_set_one>: 1f0: 0015d313 srl t1,a1,0x1 1f4: 0065e5b3 or a1,a1,t1 1f8: 0025d313 srl t1,a1,0x2 1fc: 0065e5b3 or a1,a1,t1 200: 0045d313 srl t1,a1,0x4 204: 0065e5b3 or a1,a1,t1 208: 0085d313 srl t1,a1,0x8 20c: 0065e5b3 or a1,a1,t1 210: 0105d313 srl t1,a1,0x10 214: 0065e5b3 or a1,a1,t1 218: fff00513 li a0,-1 21c: 02c0006f j 248 <clz_count_ones> 00000220 <clz_lower_set_one>: 220: 00155293 srl t0,a0,0x1 224: 00556533 or a0,a0,t0 228: 00255293 srl t0,a0,0x2 22c: 00556533 or a0,a0,t0 230: 00455293 srl t0,a0,0x4 234: 00556533 or a0,a0,t0 238: 00855293 srl t0,a0,0x8 23c: 00556533 or a0,a0,t0 240: 01055293 srl t0,a0,0x10 244: 00556533 or a0,a0,t0 00000248 <clz_count_ones>: 248: 00155293 srl t0,a0,0x1 24c: 01f59313 sll t1,a1,0x1f 250: 0062e2b3 or t0,t0,t1 254: 0015d313 srl t1,a1,0x1 258: 555553b7 lui t2,0x55555 25c: 55538393 add t2,t2,1365 # 55555555 <msg2+0x555551c3> 260: 0072f2b3 and t0,t0,t2 264: 00737333 and t1,t1,t2 268: 00553e33 sltu t3,a0,t0 26c: 40550533 sub a0,a0,t0 270: 406585b3 sub a1,a1,t1 274: 41c585b3 sub a1,a1,t3 278: 00255293 srl t0,a0,0x2 27c: 01e59313 sll t1,a1,0x1e 280: 0062e2b3 or t0,t0,t1 284: 0025d313 srl t1,a1,0x2 288: 333333b7 lui t2,0x33333 28c: 33338393 add t2,t2,819 # 33333333 <msg2+0x33332fa1> 290: 0072f2b3 and t0,t0,t2 294: 00737333 and t1,t1,t2 298: 00757eb3 and t4,a0,t2 29c: 0075ff33 and t5,a1,t2 2a0: 01d28533 add a0,t0,t4 2a4: 00553e33 sltu t3,a0,t0 2a8: 01e305b3 add a1,t1,t5 2ac: 01c585b3 add a1,a1,t3 2b0: 00455293 srl t0,a0,0x4 2b4: 01c59313 sll t1,a1,0x1c 2b8: 0062e2b3 or t0,t0,t1 2bc: 0045d313 srl t1,a1,0x4 2c0: 00a282b3 add t0,t0,a0 2c4: 00a2be33 sltu t3,t0,a0 2c8: 00b30333 add t1,t1,a1 2cc: 01c30333 add t1,t1,t3 2d0: 0f0f13b7 lui t2,0xf0f1 2d4: f0f38393 add t2,t2,-241 # f0f0f0f <msg2+0xf0f0b7d> 2d8: 0072f533 and a0,t0,t2 2dc: 007375b3 and a1,t1,t2 2e0: 00855293 srl t0,a0,0x8 2e4: 01859313 sll t1,a1,0x18 2e8: 0062e2b3 or t0,t0,t1 2ec: 0085d313 srl t1,a1,0x8 2f0: 00550533 add a0,a0,t0 2f4: 00553e33 sltu t3,a0,t0 2f8: 006585b3 add a1,a1,t1 2fc: 01c585b3 add a1,a1,t3 300: 01055293 srl t0,a0,0x10 304: 01059313 sll t1,a1,0x10 308: 0062e2b3 or t0,t0,t1 30c: 0105d313 srl t1,a1,0x10 310: 00550533 add a0,a0,t0 314: 00553e33 sltu t3,a0,t0 318: 006585b3 add a1,a1,t1 31c: 01c585b3 add a1,a1,t3 320: 00058293 mv t0,a1 324: 00000313 li t1,0 328: 00550533 add a0,a0,t0 32c: 00553e33 sltu t3,a0,t0 330: 006585b3 add a1,a1,t1 334: 01c585b3 add a1,a1,t3 338: 07f57513 and a0,a0,127 33c: 04000293 li t0,64 340: 40a28533 sub a0,t0,a0 344: 00012083 lw ra,0(sp) 348: 00410113 add sp,sp,4 34c: 00008067 ret ``` ::: ### Optimize C code / Handwritten Assembly Code #### Handwritten Assembly Code version1 I delete the redundant code in `hd_cal_loop` of original code. ```diff= # hamming distance calculation (result save in a0, a1) hd_cal_loop: - # when the current digit larger than 32 - addi t2, zero, 32 - bgt s2, t2, hd_getLSB_upper # hd_getLSB_lower : and with 1 li t3, 0x00000001 and t4, s4, t3 and t5, s6, t3 j hd_cal_shift -hd_getLSB_upper: - # and with 1 - li t3, 0x00000001 - and t4, s5, t3 - and t5, s7, t3 hd_cal_shift: # (s5 s4) = x >> 1 srli t0, s4, 1 slli t1, s5, 31 or s4, t0, t1 # s4 >> 1 srli s5, s5, 1 # s5 >> 1 # (s7 s6) = x >> 1 srli t0, s6, 1 slli t1, s7, 31 or s6, t0, t1 # s6 >> 1 srli s7, s7, 1 # s7 >> 1 beq t4, t5, hd_check_loop addi s3, s3, 1 ``` #### C code version 1 I made some slight modifications to the code to make it more concise and efficient. ```diff= int HammingDistance(uint64_t x0, uint64_t x1){ int Hdist = 0; int16_t max_digit = 64 - (int16_t)count_leading_zeros((x0 > x1)? x0 : x1); + uint64_t c1 = x0 ^ x1; while(max_digit > 0){ - uint64_t c1 = x0 & 1; - uint64_t c2 = x1 & 1; - if(c1 != c2) Hdist += 1; - x0 = x0 >> 1; - x1 = x1 >> 1; + Hdist += c1 & 1; + c1 = c1 >> 1; max_digit -= 1; } return Hdist; } ``` #### Handwritten Assembly code version2 Based on the modified code, I made changes to my assembly code. ```diff= # hamming distance calculation (result save in a0, a1) +hd_cal_xor: + xor s4, s4, s6 + xor s5, s5, s7 #c1=(s5 s4) hd_cal_loop: # hd_getLSB_lower : and with 1 li t3, 0x00000001 and t4, s4, t3 - and t5, s6, t3 j hd_cal_shift hd_cal_shift: # (s5 s4) = x >> 1 srli t0, s4, 1 slli t1, s5, 31 or s4, t0, t1 # s4 >> 1 srli s5, s5, 1 # s5 >> 1 - #(s7 s6) = x >> 1 - srli t0, s6, 1 - slli t1, s7, 31 - or s6, t0, t1 # s6 >> 1 - srli s7, s7, 1 # s7 >> 1 - beq t4, t5, hd_check_loop + bne t4, t3, hd_check_loop addi s3, s3, 1 hd_check_loop: addi s2, s2, -1 bne s2, zero, hd_cal_loop mv a0, s3 # save return value to a0 j hd_func_end ``` | Cycles | Original C code | C code version1 | | ------ | --------------- | --------------- | | -O0 | 11190 | 8594 | | -O1 | 7198 | 6614 | | -O2 | 7435 | 6851 | | -O3 | 7363 | 6793 | | -Os | 7400 | 6732 | | -Ofast | 7363 | 6793 | | | Handwritten assembly code | Assembly code v1 | Assembly code v2 | |:------:|:---------------------:|:----------------:|:----------------:| | Cycles | 2729 | 2526 | 1942 | ## Reference * [Options That Control Optimization](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html) * [text, data, bss, and dec](https://mirzafahad.github.io/2021-05-08-text-data-bss/) * [An Introduction to Makefiles](https://www.gnu.org/software/make/manual/html_node/Introduction.html)