# Assignment2: RISC-V Toolchain
contributed by < [`Yao1201`](https://github.com/Yao1201) >
## Environment Setting
The setup for the virtual machine and Ubuntu Linux environment is well-documented with clear explanations and step-by-step instructions on [Lab2: RISC-V RV32I[MACF] emulator with ELF support](https://hackmd.io/@sysprog/SJAR5XMmi). In addition to that, it also provides instructions on using the GNU Toolchain .
## Question selection
I choose the [Calculate the Hamming Distance using Counting Leading Zeros](https://hackmd.io/@O6C2C3zQQBanDM55QRZ7DQ/Lab1_RV32I_assembly) from [author_YP](https://hackmd.io/@O6C2C3zQQBanDM55QRZ7DQ)
## Motivation
The reason I chose this topic is that in communication systems, Hamming distance is a crucial piece of information during the decoding stage. By identifying the shortest Hamming distance, the bit error rate in receiving signals can be minimized. In other words, it maximizes the probability of receiving the correct signal.
## Compile C code using RISC-V gcc
### Original C code from [author_YP](https://hackmd.io/@O6C2C3zQQBanDM55QRZ7DQ)
```c
#include <stdio.h>
#include <stdint.h>
uint64_t test1_x0 = 0x0000000000100000;
uint64_t test1_x1 = 0x00000000000FFFFF;
uint64_t test2_x0 = 0x0000000000000001;
uint64_t test2_x1 = 0x7FFFFFFFFFFFFFFE;
uint64_t test3_x0 = 0x000000028370228F;
uint64_t test3_x1 = 0x000000028370228F;
uint16_t count_leading_zeros(uint64_t x){
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
x |= (x >> 32);
/* count ones (population count) */
x -= ((x >> 1) & 0x5555555555555555);
x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333);
x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f;
x += (x >> 8);
x += (x >> 16);
x += (x >> 32);
return (64 - (x & 0x7f));
}
int HammingDistance(uint64_t x0, uint64_t x1){
int Hdist = 0;
int16_t max_digit = 64 - (int16_t)count_leading_zeros((x0 > x1)? x0 : x1);
while(max_digit > 0){
uint64_t c1 = x0 & 1;
uint64_t c2 = x1 & 1;
if(c1 != c2) Hdist += 1;
x0 = x0 >> 1;
x1 = x1 >> 1;
max_digit -= 1;
}
return Hdist;
}
int main(){
printf("%d\n", HammingDistance(test1_x0, test1_x1));
printf("%d\n", HammingDistance(test2_x0, test2_x1));
printf("%d\n", HammingDistance(test3_x0, test3_x1));
return 0;
}
```
In order to obtain the number of cylces, I added a `get_cycles` function to the main function in my C code:
```diff
int main(){
+ uint64_t instret = get_instret();
+ uint64_t oldcount = get_cycles();
printf("Hamming Distance = %d\n", HammingDistance(test1_x0, test1_x1));
printf("Hamming Distance = %d\n", HammingDistance(test2_x0, test2_x1));
printf("Hamming Distance = %d\n", HammingDistance(test3_x0, test3_x1));
+ uint64_t cyclecount = get_cycles() - oldcount;
+ printf("cycle count: %u\n", (unsigned int) cyclecount);
}
```
`get_cycles` is shown as:
```c
.text
.globl get_cycles
.align 2
get_cycles:
csrr a1, cycleh
csrr a0, cycle
csrr a2, cycleh
bne a1, a2, get_cycles
ret
.size get_cycles,.-get_cycles
```
### Makefile
To facilitate compilation, we can write a Makefile:
```shell
.PHONY: clean
include ../../mk/toolchain.mk
ASFLAGS = -march=rv32i_zicsr -mabi=ilp32
CFLAGS = -O1 -Wall
OBJS =\
Hammingdist.o\
getcycles.o\
getinstret.o
BIN = Hammingdist.elf
%.o: %.s
riscv-none-elf-gcc $(ASFLAGS) -c -o $@ $<
%.o: %.c
riscv-none-elf-gcc $(ASFLAGS) $(CFLAGS) -c -o $@ $<
all: Hammingdist.elf
Hammingdist.elf: $(OBJS)
riscv-none-elf-gcc -o $@ $^
clean:
$(RM) $(BIN) $(OBJS
```
* Modify `CFLAGS` according to optimized level (O0, O1, O2, O3, Os, Ofast)
* Because we use the `CSR` command, we need to add `_zicsr` in `ASFLAGS` additionally.
After the aforementioned modifications, we finally ready to start compiling our C file.
### O0 Optimized
No optimizations are applied. These is the default compilation options.
```c1!
$ ../../build/rv32emu Hammingdist.elf
Hamming Distance = 21
Hamming Distance = 63
Hamming Distance = 0
cycle count: 11190
instret: 2ca
inferior exit code 0
$ riscv-none-elf-readelf -h Hammingdist.elf
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x100c4
Start of program headers: 52 (bytes into file)
Start of section headers: 69644 (bytes into file)
Flags: 0x1, RVC, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
$ riscv-none-elf-size Hammingdist.elf
text data bss dec hex filename
52906 1928 1528 56362 dc2a Hammingdist.elf
```
In the Disassembled code, I primarily choose to analyze function `CLZ` and `Hammingdistance` because these two sections play the most critical roles in the program.
:::spoiler **Disassembled**
```c
0010148 <count_leading_zeros>:
10148: f7010113 add sp,sp,-144
1014c: 08812623 sw s0,140(sp)
10150: 09212423 sw s2,136(sp)
10154: 09312223 sw s3,132(sp)
10158: 09412023 sw s4,128(sp)
1015c: 07512e23 sw s5,124(sp)
10160: 07612c23 sw s6,120(sp)
10164: 07712a23 sw s7,116(sp)
10168: 07812823 sw s8,112(sp)
1016c: 07912623 sw s9,108(sp)
10170: 07a12423 sw s10,104(sp)
10174: 07b12223 sw s11,100(sp)
10178: 09010413 add s0,sp,144
1017c: fca42423 sw a0,-56(s0)
10180: fcb42623 sw a1,-52(s0)
10184: fcc42783 lw a5,-52(s0)
10188: 01f79713 sll a4,a5,0x1f
1018c: fc842783 lw a5,-56(s0)
10190: 0017d613 srl a2,a5,0x1
10194: 00c76633 or a2,a4,a2
10198: fcc42783 lw a5,-52(s0)
1019c: 0017d693 srl a3,a5,0x1
101a0: fc842783 lw a5,-56(s0)
101a4: 00c7ee33 or t3,a5,a2
101a8: fcc42783 lw a5,-52(s0)
101ac: 00d7eeb3 or t4,a5,a3
101b0: fdc42423 sw t3,-56(s0)
101b4: fdd42623 sw t4,-52(s0)
101b8: fcc42783 lw a5,-52(s0)
101bc: 01e79713 sll a4,a5,0x1e
101c0: fc842783 lw a5,-56(s0)
101c4: 0027d813 srl a6,a5,0x2
101c8: 01076833 or a6,a4,a6
101cc: fcc42783 lw a5,-52(s0)
101d0: 0027d893 srl a7,a5,0x2
101d4: fc842783 lw a5,-56(s0)
101d8: 0107e7b3 or a5,a5,a6
101dc: f8f42823 sw a5,-112(s0)
101e0: fcc42783 lw a5,-52(s0)
101e4: 0117e7b3 or a5,a5,a7
101e8: f8f42a23 sw a5,-108(s0)
101ec: f9042783 lw a5,-112(s0)
101f0: f9442803 lw a6,-108(s0)
101f4: fcf42423 sw a5,-56(s0)
101f8: fd042623 sw a6,-52(s0)
101fc: fcc42783 lw a5,-52(s0)
10200: 01c79713 sll a4,a5,0x1c
10204: fc842783 lw a5,-56(s0)
10208: 0047d313 srl t1,a5,0x4
1020c: 00676333 or t1,a4,t1
10210: fcc42783 lw a5,-52(s0)
10214: 0047d393 srl t2,a5,0x4
10218: fc842783 lw a5,-56(s0)
1021c: 0067e7b3 or a5,a5,t1
10220: f8f42423 sw a5,-120(s0)
10224: fcc42783 lw a5,-52(s0)
10228: 0077e7b3 or a5,a5,t2
1022c: f8f42623 sw a5,-116(s0)
10230: f8842783 lw a5,-120(s0)
10234: f8c42803 lw a6,-116(s0)
10238: fcf42423 sw a5,-56(s0)
1023c: fd042623 sw a6,-52(s0)
10240: fcc42783 lw a5,-52(s0)
10244: 01879713 sll a4,a5,0x18
10248: fc842783 lw a5,-56(s0)
1024c: 0087d913 srl s2,a5,0x8
10250: 01276933 or s2,a4,s2
10254: fcc42783 lw a5,-52(s0)
10258: 0087d993 srl s3,a5,0x8
1025c: fc842783 lw a5,-56(s0)
10260: 0127e7b3 or a5,a5,s2
10264: f8f42023 sw a5,-128(s0)
10268: fcc42783 lw a5,-52(s0)
1026c: 0137e7b3 or a5,a5,s3
10270: f8f42223 sw a5,-124(s0)
10274: f8042783 lw a5,-128(s0)
10278: f8442803 lw a6,-124(s0)
1027c: fcf42423 sw a5,-56(s0)
10280: fd042623 sw a6,-52(s0)
10284: fcc42783 lw a5,-52(s0)
10288: 01079713 sll a4,a5,0x10
1028c: fc842783 lw a5,-56(s0)
10290: 0107d793 srl a5,a5,0x10
10294: fcf42023 sw a5,-64(s0)
10298: fc042783 lw a5,-64(s0)
1029c: 00f767b3 or a5,a4,a5
102a0: fcf42023 sw a5,-64(s0)
102a4: fcc42783 lw a5,-52(s0)
102a8: 0107d793 srl a5,a5,0x10
102ac: fcf42223 sw a5,-60(s0)
102b0: fc842783 lw a5,-56(s0)
102b4: fc042603 lw a2,-64(s0)
102b8: fc442683 lw a3,-60(s0)
102bc: 00060713 mv a4,a2
102c0: 00e7e7b3 or a5,a5,a4
102c4: f6f42c23 sw a5,-136(s0)
102c8: fcc42783 lw a5,-52(s0)
102cc: 00068713 mv a4,a3
102d0: 00e7e7b3 or a5,a5,a4
102d4: f6f42e23 sw a5,-132(s0)
102d8: f7842783 lw a5,-136(s0)
102dc: f7c42803 lw a6,-132(s0)
102e0: fcf42423 sw a5,-56(s0)
102e4: fd042623 sw a6,-52(s0)
102e8: fcc42783 lw a5,-52(s0)
102ec: 0007d793 srl a5,a5,0x0
102f0: faf42c23 sw a5,-72(s0)
102f4: fa042e23 sw zero,-68(s0)
102f8: fc842783 lw a5,-56(s0)
102fc: fb842603 lw a2,-72(s0)
10300: fbc42683 lw a3,-68(s0)
10304: 00060713 mv a4,a2
10308: 00e7e7b3 or a5,a5,a4
1030c: f6f42823 sw a5,-144(s0)
10310: fcc42783 lw a5,-52(s0)
10314: 00068713 mv a4,a3
10318: 00e7e7b3 or a5,a5,a4
1031c: f6f42a23 sw a5,-140(s0)
10320: f7042783 lw a5,-144(s0)
10324: f7442803 lw a6,-140(s0)
10328: fcf42423 sw a5,-56(s0)
1032c: fd042623 sw a6,-52(s0)
10330: fcc42783 lw a5,-52(s0)
10334: 01f79793 sll a5,a5,0x1f
10338: fc842703 lw a4,-56(s0)
1033c: 00175d13 srl s10,a4,0x1
10340: 01a7ed33 or s10,a5,s10
10344: fcc42783 lw a5,-52(s0)
10348: 0017dd93 srl s11,a5,0x1
1034c: 555557b7 lui a5,0x55555
10350: 55578793 add a5,a5,1365 # 55555555 <__BSS_END__+0x555377d1>
10354: 00fd77b3 and a5,s10,a5
10358: faf42823 sw a5,-80(s0)
1035c: 555557b7 lui a5,0x55555
10360: 55578793 add a5,a5,1365 # 55555555 <__BSS_END__+0x555377d1>
10364: 00fdf7b3 and a5,s11,a5
10368: faf42a23 sw a5,-76(s0)
1036c: fc842603 lw a2,-56(s0)
10370: fcc42683 lw a3,-52(s0)
10374: fb042803 lw a6,-80(s0)
10378: fb442883 lw a7,-76(s0)
1037c: 00080593 mv a1,a6
10380: 40b60733 sub a4,a2,a1
10384: 00070593 mv a1,a4
10388: 00b635b3 sltu a1,a2,a1
1038c: 00088513 mv a0,a7
10390: 40a687b3 sub a5,a3,a0
10394: 40b786b3 sub a3,a5,a1
10398: 00068793 mv a5,a3
1039c: fce42423 sw a4,-56(s0)
103a0: fcf42623 sw a5,-52(s0)
103a4: fcc42783 lw a5,-52(s0)
103a8: 01e79793 sll a5,a5,0x1e
103ac: fc842703 lw a4,-56(s0)
103b0: 00275c13 srl s8,a4,0x2
103b4: 0187ec33 or s8,a5,s8
103b8: fcc42783 lw a5,-52(s0)
103bc: 0027dc93 srl s9,a5,0x2
103c0: 333337b7 lui a5,0x33333
103c4: 33378793 add a5,a5,819 # 33333333 <__BSS_END__+0x333155af>
103c8: 00fc77b3 and a5,s8,a5
103cc: faf42423 sw a5,-88(s0)
103d0: 333337b7 lui a5,0x33333
103d4: 33378793 add a5,a5,819 # 33333333 <__BSS_END__+0x333155af>
103d8: 00fcf7b3 and a5,s9,a5
103dc: faf42623 sw a5,-84(s0)
103e0: fc842703 lw a4,-56(s0)
103e4: 333337b7 lui a5,0x33333
103e8: 33378793 add a5,a5,819 # 33333333 <__BSS_END__+0x333155af>
103ec: 00f777b3 and a5,a4,a5
103f0: faf42023 sw a5,-96(s0)
103f4: fcc42703 lw a4,-52(s0)
103f8: 333337b7 lui a5,0x33333
103fc: 33378793 add a5,a5,819 # 33333333 <__BSS_END__+0x333155af>
10400: 00f777b3 and a5,a4,a5
10404: faf42223 sw a5,-92(s0)
10408: fa842503 lw a0,-88(s0)
1040c: fac42583 lw a1,-84(s0)
10410: 00050693 mv a3,a0
10414: fa042803 lw a6,-96(s0)
10418: fa442883 lw a7,-92(s0)
1041c: 00080613 mv a2,a6
10420: 00c68733 add a4,a3,a2
10424: 00070693 mv a3,a4
10428: 00050613 mv a2,a0
1042c: 00c6b6b3 sltu a3,a3,a2
10430: 00058613 mv a2,a1
10434: 00088593 mv a1,a7
10438: 00b607b3 add a5,a2,a1
1043c: 00f686b3 add a3,a3,a5
10440: 00068793 mv a5,a3
10444: fce42423 sw a4,-56(s0)
10448: fcf42623 sw a5,-52(s0)
1044c: fcc42783 lw a5,-52(s0)
10450: 01c79793 sll a5,a5,0x1c
10454: fc842703 lw a4,-56(s0)
10458: 00475f13 srl t5,a4,0x4
1045c: 01e7ef33 or t5,a5,t5
10460: fcc42783 lw a5,-52(s0)
10464: 0047df93 srl t6,a5,0x4
10468: fc842603 lw a2,-56(s0)
1046c: fcc42683 lw a3,-52(s0)
10470: 00cf0733 add a4,t5,a2
10474: 00070593 mv a1,a4
10478: 01e5b5b3 sltu a1,a1,t5
1047c: 00df87b3 add a5,t6,a3
10480: 00f586b3 add a3,a1,a5
10484: 00068793 mv a5,a3
10488: 0f0f16b7 lui a3,0xf0f1
1048c: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d318b>
10490: 00d776b3 and a3,a4,a3
10494: fcd42423 sw a3,-56(s0)
10498: 0f0f16b7 lui a3,0xf0f1
1049c: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d318b>
104a0: 00d7f7b3 and a5,a5,a3
104a4: fcf42623 sw a5,-52(s0)
104a8: fcc42783 lw a5,-52(s0)
104ac: 01879793 sll a5,a5,0x18
104b0: fc842703 lw a4,-56(s0)
104b4: 00875b13 srl s6,a4,0x8
104b8: 0167eb33 or s6,a5,s6
104bc: fcc42783 lw a5,-52(s0)
104c0: 0087db93 srl s7,a5,0x8
104c4: fc842603 lw a2,-56(s0)
104c8: fcc42683 lw a3,-52(s0)
104cc: 01660733 add a4,a2,s6
104d0: 00070593 mv a1,a4
104d4: 00c5b5b3 sltu a1,a1,a2
104d8: 017687b3 add a5,a3,s7
104dc: 00f586b3 add a3,a1,a5
104e0: 00068793 mv a5,a3
104e4: fce42423 sw a4,-56(s0)
104e8: fcf42623 sw a5,-52(s0)
104ec: fcc42783 lw a5,-52(s0)
104f0: 01079793 sll a5,a5,0x10
104f4: fc842703 lw a4,-56(s0)
104f8: 01075a13 srl s4,a4,0x10
104fc: 0147ea33 or s4,a5,s4
10500: fcc42783 lw a5,-52(s0)
10504: 0107da93 srl s5,a5,0x10
10508: fc842603 lw a2,-56(s0)
1050c: fcc42683 lw a3,-52(s0)
10510: 01460733 add a4,a2,s4
10514: 00070593 mv a1,a4
10518: 00c5b5b3 sltu a1,a1,a2
1051c: 015687b3 add a5,a3,s5
10520: 00f586b3 add a3,a1,a5
10524: 00068793 mv a5,a3
10528: fce42423 sw a4,-56(s0)
1052c: fcf42623 sw a5,-52(s0)
10530: fcc42783 lw a5,-52(s0)
10534: 0007d793 srl a5,a5,0x0
10538: f8f42c23 sw a5,-104(s0)
1053c: f8042e23 sw zero,-100(s0)
10540: fc842603 lw a2,-56(s0)
10544: fcc42683 lw a3,-52(s0)
10548: f9842803 lw a6,-104(s0)
1054c: f9c42883 lw a7,-100(s0)
10550: 00080593 mv a1,a6
10554: 00b60733 add a4,a2,a1
10558: 00070593 mv a1,a4
1055c: 00c5b5b3 sltu a1,a1,a2
10560: 00088513 mv a0,a7
10564: 00a687b3 add a5,a3,a0
10568: 00f586b3 add a3,a1,a5
1056c: 00068793 mv a5,a3
10570: fce42423 sw a4,-56(s0)
10574: fcf42623 sw a5,-52(s0)
10578: fc845783 lhu a5,-56(s0)
1057c: 07f7f793 and a5,a5,127
10580: 01079793 sll a5,a5,0x10
10584: 0107d793 srl a5,a5,0x10
10588: 04000713 li a4,64
1058c: 40f707b3 sub a5,a4,a5
10590: 01079793 sll a5,a5,0x10
10594: 0107d793 srl a5,a5,0x10
10598: 00078513 mv a0,a5
1059c: 08c12403 lw s0,140(sp)
105a0: 08812903 lw s2,136(sp)
105a4: 08412983 lw s3,132(sp)
105a8: 08012a03 lw s4,128(sp)
105ac: 07c12a83 lw s5,124(sp)
105b0: 07812b03 lw s6,120(sp)
105b4: 07412b83 lw s7,116(sp)
105b8: 07012c03 lw s8,112(sp)
105bc: 06c12c83 lw s9,108(sp)
105c0: 06812d03 lw s10,104(sp)
105c4: 06412d83 lw s11,100(sp)
105c8: 09010113 add sp,sp,144
105cc: 00008067 ret
000105d0 <HammingDistance>:
105d0: fa010113 add sp,sp,-96
105d4: 04112e23 sw ra,92(sp)
105d8: 04812c23 sw s0,88(sp)
105dc: 05212a23 sw s2,84(sp)
105e0: 05312823 sw s3,80(sp)
105e4: 05412623 sw s4,76(sp)
105e8: 05512423 sw s5,72(sp)
105ec: 05612223 sw s6,68(sp)
105f0: 05712023 sw s7,64(sp)
105f4: 03812e23 sw s8,60(sp)
105f8: 03912c23 sw s9,56(sp)
105fc: 06010413 add s0,sp,96
10600: faa42423 sw a0,-88(s0)
10604: fab42623 sw a1,-84(s0)
10608: fac42023 sw a2,-96(s0)
1060c: fad42223 sw a3,-92(s0)
10610: fc042623 sw zero,-52(s0)
10614: fa842603 lw a2,-88(s0)
10618: fac42683 lw a3,-84(s0)
1061c: fa042703 lw a4,-96(s0)
10620: fa442783 lw a5,-92(s0)
10624: 00068513 mv a0,a3
10628: 00078593 mv a1,a5
1062c: 00a5ee63 bltu a1,a0,10648 <HammingDistance+0x78>
10630: 00068513 mv a0,a3
10634: 00078593 mv a1,a5
10638: 00b51c63 bne a0,a1,10650 <HammingDistance+0x80>
1063c: 00060513 mv a0,a2
10640: 00070593 mv a1,a4
10644: 00a5f663 bgeu a1,a0,10650 <HammingDistance+0x80>
10648: 00060713 mv a4,a2
1064c: 00068793 mv a5,a3
10650: 00070513 mv a0,a4
10654: 00078593 mv a1,a5
10658: af1ff0ef jal 10148 <count_leading_zeros>
1065c: 00050793 mv a5,a0
10660: 00078713 mv a4,a5
10664: 04000793 li a5,64
10668: 40e787b3 sub a5,a5,a4
1066c: 01079793 sll a5,a5,0x10
10670: 0107d793 srl a5,a5,0x10
10674: fcf41523 sh a5,-54(s0)
10678: 0b40006f j 1072c <HammingDistance+0x15c>
1067c: fa842783 lw a5,-88(s0)
10680: 0017fb13 and s6,a5,1
10684: fac42783 lw a5,-84(s0)
10688: 0007fb93 and s7,a5,0
1068c: fd642023 sw s6,-64(s0)
10690: fd742223 sw s7,-60(s0)
10694: fa042783 lw a5,-96(s0)
10698: 0017fc13 and s8,a5,1
1069c: fa442783 lw a5,-92(s0)
106a0: 0007fc93 and s9,a5,0
106a4: fb842c23 sw s8,-72(s0)
106a8: fb942e23 sw s9,-68(s0)
106ac: fc042703 lw a4,-64(s0)
106b0: fb842783 lw a5,-72(s0)
106b4: 00f71863 bne a4,a5,106c4 <HammingDistance+0xf4>
106b8: fc442703 lw a4,-60(s0)
106bc: fbc42783 lw a5,-68(s0)
106c0: 00f70863 beq a4,a5,106d0 <HammingDistance+0x100>
106c4: fcc42783 lw a5,-52(s0)
106c8: 00178793 add a5,a5,1
106cc: fcf42623 sw a5,-52(s0)
106d0: fac42783 lw a5,-84(s0)
106d4: 01f79793 sll a5,a5,0x1f
106d8: fa842703 lw a4,-88(s0)
106dc: 00175913 srl s2,a4,0x1
106e0: 0127e933 or s2,a5,s2
106e4: fac42783 lw a5,-84(s0)
106e8: 0017d993 srl s3,a5,0x1
106ec: fb242423 sw s2,-88(s0)
106f0: fb342623 sw s3,-84(s0)
106f4: fa442783 lw a5,-92(s0)
106f8: 01f79793 sll a5,a5,0x1f
106fc: fa042703 lw a4,-96(s0)
10700: 00175a13 srl s4,a4,0x1
10704: 0147ea33 or s4,a5,s4
10708: fa442783 lw a5,-92(s0)
1070c: 0017da93 srl s5,a5,0x1
10710: fb442023 sw s4,-96(s0)
10714: fb542223 sw s5,-92(s0)
10718: fca45783 lhu a5,-54(s0)
1071c: fff78793 add a5,a5,-1
10720: 01079793 sll a5,a5,0x10
10724: 0107d793 srl a5,a5,0x10
10728: fcf41523 sh a5,-54(s0)
1072c: fca41783 lh a5,-54(s0)
10730: f4f046e3 bgtz a5,1067c <HammingDistance+0xac>
10734: fcc42783 lw a5,-52(s0)
10738: 00078513 mv a0,a5
1073c: 05c12083 lw ra,92(sp)
10740: 05812403 lw s0,88(sp)
10744: 05412903 lw s2,84(sp)
10748: 05012983 lw s3,80(sp)
1074c: 04c12a03 lw s4,76(sp)
10750: 04812a83 lw s5,72(sp)
10754: 04412b03 lw s6,68(sp)
10758: 04012b83 lw s7,64(sp)
1075c: 03c12c03 lw s8,60(sp)
10760: 03812c83 lw s9,56(sp)
10764: 06010113 add sp,sp,96
10768: 00008067 ret
```
:::
#### Observation
* Register Usage
| Register | CLZ | Hammingdistnce |
|:--------:|:--------------:|:--------------:|
| a-type | 7 (a0-a6) | 6(a0-a5) |
| s-type | 11(s1, s2-s11) | 9(s0, s2-s9) |
| t-type | 2(t5, t6) | 0 |
| other | sp | ra, sp |
* Stack Usage
| Stack | CLZ | Hammingdistance |
| ----- | --- |:---------------:|
| Usage | 144 | 96 |
* Lines of code/`lw`/`sw`
| Line of code | lw | sw |
|:------------:| --- | --- |
| 371 | 120 | 84 |
The code has 371 lines. It's too lengthy and it has a bunch of `lw`/`sw`. Besides, a lot of `mv`/`lw`/`sw` are redundantm, resulting in low efficiency code.
### O1 Optimized
The compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time.
```shell
$ ../../build/rv32emu Hammingdist.elf
Hamming Distance = 21
Hamming Distance = 63
Hamming Distance = 0
cycle count: 7198
instret: 2cb
inferior exit code 0
$ riscv-none-elf-readelf -h Hammingdist.elf
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x100c2
Start of program headers: 52 (bytes into file)
Start of section headers: 69540 (bytes into file)
Flags: 0x1, RVC, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
$ riscv-none-elf-size Hammingdist.elf
text data bss dec hex filename
51790 1928 1528 55246 d7ce Hammingdist.elf
```
:::spoiler **Disassembled**
```c
00010144 <count_leading_zeros>:
10144: 01f59713 sll a4,a1,0x1f
10148: 00155793 srl a5,a0,0x1
1014c: 00f767b3 or a5,a4,a5
10150: 0015d713 srl a4,a1,0x1
10154: 00a7e533 or a0,a5,a0
10158: 00b765b3 or a1,a4,a1
1015c: 01e59713 sll a4,a1,0x1e
10160: 00255793 srl a5,a0,0x2
10164: 00f767b3 or a5,a4,a5
10168: 0025d613 srl a2,a1,0x2
1016c: 00a7e533 or a0,a5,a0
10170: 00b66633 or a2,a2,a1
10174: 01c61713 sll a4,a2,0x1c
10178: 00455793 srl a5,a0,0x4
1017c: 00f767b3 or a5,a4,a5
10180: 00465693 srl a3,a2,0x4
10184: 00a7e733 or a4,a5,a0
10188: 00c6e6b3 or a3,a3,a2
1018c: 01869613 sll a2,a3,0x18
10190: 00875793 srl a5,a4,0x8
10194: 00f667b3 or a5,a2,a5
10198: 0086d613 srl a2,a3,0x8
1019c: 00e7e7b3 or a5,a5,a4
101a0: 00d66633 or a2,a2,a3
101a4: 01061713 sll a4,a2,0x10
101a8: 0107d693 srl a3,a5,0x10
101ac: 00d766b3 or a3,a4,a3
101b0: 01065713 srl a4,a2,0x10
101b4: 00f6e6b3 or a3,a3,a5
101b8: 00c76733 or a4,a4,a2
101bc: 00d766b3 or a3,a4,a3
101c0: 01f71613 sll a2,a4,0x1f
101c4: 0016d793 srl a5,a3,0x1
101c8: 00f667b3 or a5,a2,a5
101cc: 00175593 srl a1,a4,0x1
101d0: 55555637 lui a2,0x55555
101d4: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x555377d1>
101d8: 00c7f7b3 and a5,a5,a2
101dc: 00c5f633 and a2,a1,a2
101e0: 40f687b3 sub a5,a3,a5
101e4: 00f6b6b3 sltu a3,a3,a5
101e8: 40c70733 sub a4,a4,a2
101ec: 40d70733 sub a4,a4,a3
101f0: 01e71613 sll a2,a4,0x1e
101f4: 0027d693 srl a3,a5,0x2
101f8: 00d666b3 or a3,a2,a3
101fc: 00275593 srl a1,a4,0x2
10200: 33333637 lui a2,0x33333
10204: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155af>
10208: 00c6f6b3 and a3,a3,a2
1020c: 00c5f5b3 and a1,a1,a2
10210: 00c7f7b3 and a5,a5,a2
10214: 00c77733 and a4,a4,a2
10218: 00f687b3 add a5,a3,a5
1021c: 00d7b6b3 sltu a3,a5,a3
10220: 00e58733 add a4,a1,a4
10224: 00e686b3 add a3,a3,a4
10228: 01c69613 sll a2,a3,0x1c
1022c: 0047d713 srl a4,a5,0x4
10230: 00e66733 or a4,a2,a4
10234: 0046d613 srl a2,a3,0x4
10238: 00f707b3 add a5,a4,a5
1023c: 00e7b733 sltu a4,a5,a4
10240: 00d606b3 add a3,a2,a3
10244: 00d70733 add a4,a4,a3
10248: 0f0f16b7 lui a3,0xf0f1
1024c: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d318b>
10250: 00d7f7b3 and a5,a5,a3
10254: 00d77733 and a4,a4,a3
10258: 01871613 sll a2,a4,0x18
1025c: 0087d693 srl a3,a5,0x8
10260: 00d666b3 or a3,a2,a3
10264: 00875613 srl a2,a4,0x8
10268: 00f687b3 add a5,a3,a5
1026c: 00d7b6b3 sltu a3,a5,a3
10270: 00e60733 add a4,a2,a4
10274: 00e686b3 add a3,a3,a4
10278: 01069613 sll a2,a3,0x10
1027c: 0107d713 srl a4,a5,0x10
10280: 00e66733 or a4,a2,a4
10284: 0106d613 srl a2,a3,0x10
10288: 00f707b3 add a5,a4,a5
1028c: 00e7b733 sltu a4,a5,a4
10290: 00d606b3 add a3,a2,a3
10294: 00d70733 add a4,a4,a3
10298: 00f70733 add a4,a4,a5
1029c: 07f77713 and a4,a4,127
102a0: 04000513 li a0,64
102a4: 40e50533 sub a0,a0,a4
102a8: 01051513 sll a0,a0,0x10
102ac: 01055513 srl a0,a0,0x10
102b0: 00008067 ret
000102b4 <HammingDistance>:
102b4: fe010113 add sp,sp,-32
102b8: 00112e23 sw ra,28(sp)
102bc: 00812c23 sw s0,24(sp)
102c0: 00912a23 sw s1,20(sp)
102c4: 01212823 sw s2,16(sp)
102c8: 01312623 sw s3,12(sp)
102cc: 00050493 mv s1,a0
102d0: 00058913 mv s2,a1
102d4: 00060413 mv s0,a2
102d8: 00068993 mv s3,a3
102dc: 00b6ea63 bltu a3,a1,102f0 <HammingDistance+0x3c>
102e0: 00060513 mv a0,a2
102e4: 00068593 mv a1,a3
102e8: 00d91863 bne s2,a3,102f8 <HammingDistance+0x44>
102ec: 00967663 bgeu a2,s1,102f8 <HammingDistance+0x44>
102f0: 00048513 mv a0,s1
102f4: 00090593 mv a1,s2
102f8: e4dff0ef jal 10144 <count_leading_zeros>
102fc: 04000793 li a5,64
10300: 40a787b3 sub a5,a5,a0
10304: 01079793 sll a5,a5,0x10
10308: 4107d793 sra a5,a5,0x10
1030c: 06f05063 blez a5,1036c <HammingDistance+0xb8>
10310: 00000513 li a0,0
10314: 0084c733 xor a4,s1,s0
10318: 00177713 and a4,a4,1
1031c: 00e50533 add a0,a0,a4
10320: 01f91713 sll a4,s2,0x1f
10324: 0014d493 srl s1,s1,0x1
10328: 009764b3 or s1,a4,s1
1032c: 00195913 srl s2,s2,0x1
10330: 01f99713 sll a4,s3,0x1f
10334: 00145413 srl s0,s0,0x1
10338: 00876433 or s0,a4,s0
1033c: 0019d993 srl s3,s3,0x1
10340: fff78793 add a5,a5,-1
10344: 01079793 sll a5,a5,0x10
10348: 4107d793 sra a5,a5,0x10
1034c: fc0794e3 bnez a5,10314 <HammingDistance+0x60>
10350: 01c12083 lw ra,28(sp)
10354: 01812403 lw s0,24(sp)
10358: 01412483 lw s1,20(sp)
1035c: 01012903 lw s2,16(sp)
10360: 00c12983 lw s3,12(sp)
10364: 02010113 add sp,sp,32
10368: 00008067 ret
1036c: 00000513 li a0,0
10370: fe1ff06f j 10350 <HammingDistance+0x9c>
```
:::
#### Observation
* Register Usage
| Register | CLZ | Hammingdistnce |
|:--------:|:--------------:|:--------------:|
| a-type | 6 (a0-a5) | 6(a0-a5) |
| s-type | 0 | 4(s0-s3) |
| t-type | 0 | 0 |
| other | 0 | ra, sp |
* Stack Usage
| Stack | CLZ | Hammingdistance |
| ----- | --- |:---------------:|
| Usage | 0 | 32 |
* Lines of code/`lw`/`sw`
| Line of code | lw | sw |
|:------------:| --- | --- |
| 144 | 5 | 5 |
Compare to O0 optimaztion, the number of `line of code`, `lw`, `sw` are decreased greatly. Besides, the usage of register are reduced too.
### O2 Optimized
Optimize even more.
```c
$ ../../build/rv32emu Hammingdist.elf
Hamming Distance = 21
Hamming Distance = 63
Hamming Distance = 0
cycle count: 7435
instret: 2cb
inferior exit code 0
$ riscv-none-elf-readelf -h Hammingdist.elf
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x1018a
Start of program headers: 52 (bytes into file)
Start of section headers: 69564 (bytes into file)
Flags: 0x1, RVC, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
$ riscv-none-elf-size Hammingdist.elf
text data bss dec hex filename
51794 1928 1528 55250 d7d2 Hammingdist.elf
```
:::spoiler **Disaasemble**
```c
0001020c <count_leading_zeros>:
1020c: 01f59713 sll a4,a1,0x1f
10210: 00155793 srl a5,a0,0x1
10214: 00f767b3 or a5,a4,a5
10218: 0015d713 srl a4,a1,0x1
1021c: 00b765b3 or a1,a4,a1
10220: 00a7e533 or a0,a5,a0
10224: 01e59713 sll a4,a1,0x1e
10228: 00255793 srl a5,a0,0x2
1022c: 00f767b3 or a5,a4,a5
10230: 0025d713 srl a4,a1,0x2
10234: 00b76733 or a4,a4,a1
10238: 00a7e533 or a0,a5,a0
1023c: 01c71693 sll a3,a4,0x1c
10240: 00455793 srl a5,a0,0x4
10244: 00f6e7b3 or a5,a3,a5
10248: 00475693 srl a3,a4,0x4
1024c: 00e6e6b3 or a3,a3,a4
10250: 00a7e733 or a4,a5,a0
10254: 01869613 sll a2,a3,0x18
10258: 00875793 srl a5,a4,0x8
1025c: 00f667b3 or a5,a2,a5
10260: 0086d613 srl a2,a3,0x8
10264: 00d66633 or a2,a2,a3
10268: 00e7e7b3 or a5,a5,a4
1026c: 0107d693 srl a3,a5,0x10
10270: 01061713 sll a4,a2,0x10
10274: 00d766b3 or a3,a4,a3
10278: 01065713 srl a4,a2,0x10
1027c: 00c76733 or a4,a4,a2
10280: 00f6e6b3 or a3,a3,a5
10284: 00d766b3 or a3,a4,a3
10288: 01f71593 sll a1,a4,0x1f
1028c: 0016d793 srl a5,a3,0x1
10290: 55555637 lui a2,0x55555
10294: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x555377d1>
10298: 00f5e7b3 or a5,a1,a5
1029c: 00c7f7b3 and a5,a5,a2
102a0: 00175593 srl a1,a4,0x1
102a4: 40f687b3 sub a5,a3,a5
102a8: 00c5f633 and a2,a1,a2
102ac: 00f6b6b3 sltu a3,a3,a5
102b0: 40c70733 sub a4,a4,a2
102b4: 40d70733 sub a4,a4,a3
102b8: 01e71593 sll a1,a4,0x1e
102bc: 0027d693 srl a3,a5,0x2
102c0: 33333637 lui a2,0x33333
102c4: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155af>
102c8: 00d5e6b3 or a3,a1,a3
102cc: 00c6f6b3 and a3,a3,a2
102d0: 00275593 srl a1,a4,0x2
102d4: 00c7f7b3 and a5,a5,a2
102d8: 00f687b3 add a5,a3,a5
102dc: 00c5f5b3 and a1,a1,a2
102e0: 00c77733 and a4,a4,a2
102e4: 00e58733 add a4,a1,a4
102e8: 00d7b6b3 sltu a3,a5,a3
102ec: 00e686b3 add a3,a3,a4
102f0: 01c69613 sll a2,a3,0x1c
102f4: 0047d713 srl a4,a5,0x4
102f8: 00e66733 or a4,a2,a4
102fc: 00f707b3 add a5,a4,a5
10300: 0046d613 srl a2,a3,0x4
10304: 00d60633 add a2,a2,a3
10308: 00e7b733 sltu a4,a5,a4
1030c: 0f0f16b7 lui a3,0xf0f1
10310: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d318b>
10314: 00c70733 add a4,a4,a2
10318: 00d77733 and a4,a4,a3
1031c: 00d7f7b3 and a5,a5,a3
10320: 01871613 sll a2,a4,0x18
10324: 0087d693 srl a3,a5,0x8
10328: 00d666b3 or a3,a2,a3
1032c: 00f687b3 add a5,a3,a5
10330: 00875613 srl a2,a4,0x8
10334: 00e60733 add a4,a2,a4
10338: 00d7b6b3 sltu a3,a5,a3
1033c: 00e686b3 add a3,a3,a4
10340: 01069613 sll a2,a3,0x10
10344: 0107d713 srl a4,a5,0x10
10348: 00e66733 or a4,a2,a4
1034c: 00f707b3 add a5,a4,a5
10350: 0106d613 srl a2,a3,0x10
10354: 00e7b733 sltu a4,a5,a4
10358: 00d606b3 add a3,a2,a3
1035c: 00d70733 add a4,a4,a3
10360: 00f70733 add a4,a4,a5
10364: 07f77713 and a4,a4,127
10368: 04000513 li a0,64
1036c: 40e50533 sub a0,a0,a4
10370: 01051513 sll a0,a0,0x10
10374: 01055513 srl a0,a0,0x10
10378: 00008067 ret
0001037c <HammingDistance>:
1037c: fe010113 add sp,sp,-32
10380: 00812c23 sw s0,24(sp)
10384: 00912a23 sw s1,20(sp)
10388: 01212823 sw s2,16(sp)
1038c: 01312623 sw s3,12(sp)
10390: 00112e23 sw ra,28(sp)
10394: 00060413 mv s0,a2
10398: 00068913 mv s2,a3
1039c: 00058993 mv s3,a1
103a0: 00050493 mv s1,a0
103a4: 08b6e863 bltu a3,a1,10434 <HammingDistance+0xb8>
103a8: 00060513 mv a0,a2
103ac: 00068593 mv a1,a3
103b0: 08d98063 beq s3,a3,10430 <HammingDistance+0xb4>
103b4: e59ff0ef jal 1020c <count_leading_zeros>
103b8: 04000793 li a5,64
103bc: 40a787b3 sub a5,a5,a0
103c0: 01079793 sll a5,a5,0x10
103c4: 4107d793 sra a5,a5,0x10
103c8: 00000513 li a0,0
103cc: 04f05463 blez a5,10414 <HammingDistance+0x98>
103d0: fff78793 add a5,a5,-1
103d4: 0084c733 xor a4,s1,s0
103d8: 01079693 sll a3,a5,0x10
103dc: 01f99593 sll a1,s3,0x1f
103e0: 01f91613 sll a2,s2,0x1f
103e4: 00177713 and a4,a4,1
103e8: 0014d493 srl s1,s1,0x1
103ec: 00145413 srl s0,s0,0x1
103f0: 01079793 sll a5,a5,0x10
103f4: 0106d693 srl a3,a3,0x10
103f8: 00e50533 add a0,a0,a4
103fc: 0095e4b3 or s1,a1,s1
10400: 0019d993 srl s3,s3,0x1
10404: 00866433 or s0,a2,s0
10408: 00195913 srl s2,s2,0x1
1040c: 4107d793 sra a5,a5,0x10
10410: fc0690e3 bnez a3,103d0 <HammingDistance+0x54>
10414: 01c12083 lw ra,28(sp)
10418: 01812403 lw s0,24(sp)
1041c: 01412483 lw s1,20(sp)
10420: 01012903 lw s2,16(sp)
10424: 00c12983 lw s3,12(sp)
10428: 02010113 add sp,sp,32
1042c: 00008067 ret
10430: f89672e3 bgeu a2,s1,103b4 <HammingDistance+0x38>
10434: 00048513 mv a0,s1
10438: 00098593 mv a1,s3
1043c: f79ff06f j 103b4 <HammingDistance+0x38>
```
:::
#### Observation
* Register Usage
| Register | CLZ | Hammingdistnce |
|:--------:|:--------------:|:--------------:|
| a-type | 6 (a0-a5) | 6(a0-a5) |
| s-type | 0 | 4(s0-s3) |
| t-type | 0 | 0 |
| other | 0 | ra, sp |
* Stack Usage
| Stack | CLZ | Hammingdistance |
| ----- | --- |:---------------:|
| Usage |0 | 32 |
* Lines of code/`lw`/`sw`
| Line of code | lw | sw |
|:------------:| --- | --- |
| 144 | 5 | 5 |
O2 optimaztion is similar to O1.
### O3 Optimized
Optimize yet more. Turns on all optimizations specified by -O2 and also turns on some optimization flags, like `-floop-unroll-and-jam`
```c
$ ../../build/rv32emu Hammingdist.elf
Hamming Distance = 21
Hamming Distance = 63
Hamming Distance = 0
cycle count: 7363
instret: 2cb
inferior exit code 0
$ riscv-none-elf-readelf -h Hammingdist.elf
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x10334
Start of program headers: 52 (bytes into file)
Start of section headers: 70132 (bytes into file)
Flags: 0x1, RVC, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
$ riscv-none-elf-size Hammingdist.elf
text data bss dec hex filename
52074 1928 1528 55530 d8ea Hammingdist.elf
```
:::spoiler **Disaasemble**
```c
0001020c <count_leading_zeros>:
1020c: 01f59713 sll a4,a1,0x1f
10210: 00155793 srl a5,a0,0x1
10214: 00f767b3 or a5,a4,a5
10218: 0015d713 srl a4,a1,0x1
1021c: 00b765b3 or a1,a4,a1
10220: 00a7e533 or a0,a5,a0
10224: 01e59713 sll a4,a1,0x1e
10228: 00255793 srl a5,a0,0x2
1022c: 00f767b3 or a5,a4,a5
10230: 0025d713 srl a4,a1,0x2
10234: 00b76733 or a4,a4,a1
10238: 00a7e533 or a0,a5,a0
1023c: 01c71693 sll a3,a4,0x1c
10240: 00455793 srl a5,a0,0x4
10244: 00f6e7b3 or a5,a3,a5
10248: 00475693 srl a3,a4,0x4
1024c: 00e6e6b3 or a3,a3,a4
10250: 00a7e733 or a4,a5,a0
10254: 01869613 sll a2,a3,0x18
10258: 00875793 srl a5,a4,0x8
1025c: 00f667b3 or a5,a2,a5
10260: 0086d613 srl a2,a3,0x8
10264: 00d66633 or a2,a2,a3
10268: 00e7e7b3 or a5,a5,a4
1026c: 0107d693 srl a3,a5,0x10
10270: 01061713 sll a4,a2,0x10
10274: 00d766b3 or a3,a4,a3
10278: 01065713 srl a4,a2,0x10
1027c: 00c76733 or a4,a4,a2
10280: 00f6e6b3 or a3,a3,a5
10284: 00d766b3 or a3,a4,a3
10288: 01f71593 sll a1,a4,0x1f
1028c: 0016d793 srl a5,a3,0x1
10290: 55555637 lui a2,0x55555
10294: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x555377d1>
10298: 00f5e7b3 or a5,a1,a5
1029c: 00c7f7b3 and a5,a5,a2
102a0: 00175593 srl a1,a4,0x1
102a4: 40f687b3 sub a5,a3,a5
102a8: 00c5f633 and a2,a1,a2
102ac: 00f6b6b3 sltu a3,a3,a5
102b0: 40c70733 sub a4,a4,a2
102b4: 40d70733 sub a4,a4,a3
102b8: 01e71593 sll a1,a4,0x1e
102bc: 0027d693 srl a3,a5,0x2
102c0: 33333637 lui a2,0x33333
102c4: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155af>
102c8: 00d5e6b3 or a3,a1,a3
102cc: 00c6f6b3 and a3,a3,a2
102d0: 00275593 srl a1,a4,0x2
102d4: 00c7f7b3 and a5,a5,a2
102d8: 00f687b3 add a5,a3,a5
102dc: 00c5f5b3 and a1,a1,a2
102e0: 00c77733 and a4,a4,a2
102e4: 00e58733 add a4,a1,a4
102e8: 00d7b6b3 sltu a3,a5,a3
102ec: 00e686b3 add a3,a3,a4
102f0: 01c69613 sll a2,a3,0x1c
102f4: 0047d713 srl a4,a5,0x4
102f8: 00e66733 or a4,a2,a4
102fc: 00f707b3 add a5,a4,a5
10300: 0046d613 srl a2,a3,0x4
10304: 00d60633 add a2,a2,a3
10308: 00e7b733 sltu a4,a5,a4
1030c: 0f0f16b7 lui a3,0xf0f1
10310: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d318b>
10314: 00c70733 add a4,a4,a2
10318: 00d77733 and a4,a4,a3
1031c: 00d7f7b3 and a5,a5,a3
10320: 01871613 sll a2,a4,0x18
10324: 0087d693 srl a3,a5,0x8
10328: 00d666b3 or a3,a2,a3
1032c: 00f687b3 add a5,a3,a5
10330: 00875613 srl a2,a4,0x8
10334: 00e60733 add a4,a2,a4
10338: 00d7b6b3 sltu a3,a5,a3
1033c: 00e686b3 add a3,a3,a4
10340: 01069613 sll a2,a3,0x10
10344: 0107d713 srl a4,a5,0x10
10348: 00e66733 or a4,a2,a4
1034c: 00f707b3 add a5,a4,a5
10350: 0106d613 srl a2,a3,0x10
10354: 00e7b733 sltu a4,a5,a4
10358: 00d606b3 add a3,a2,a3
1035c: 00d70733 add a4,a4,a3
10360: 00f70733 add a4,a4,a5
10364: 07f77713 and a4,a4,127
10368: 04000513 li a0,64
1036c: 40e50533 sub a0,a0,a4
10370: 01051513 sll a0,a0,0x10
10374: 01055513 srl a0,a0,0x10
10378: 00008067 ret
0001037c <HammingDistance>:
1037c: 00050713 mv a4,a0
10380: 00060893 mv a7,a2
10384: 00068813 mv a6,a3
10388: 1ab6ee63 bltu a3,a1,10544 <HammingDistance+0x1c8>
1038c: 1ad58a63 beq a1,a3,10540 <HammingDistance+0x1c4>
10390: 01f81793 sll a5,a6,0x1f
10394: 0018d513 srl a0,a7,0x1
10398: 00a7e533 or a0,a5,a0
1039c: 00185793 srl a5,a6,0x1
103a0: 0107e7b3 or a5,a5,a6
103a4: 01156833 or a6,a0,a7
103a8: 00285513 srl a0,a6,0x2
103ac: 01e79893 sll a7,a5,0x1e
103b0: 00a8e533 or a0,a7,a0
103b4: 0027d893 srl a7,a5,0x2
103b8: 0117e7b3 or a5,a5,a7
103bc: 00a86833 or a6,a6,a0
103c0: 01c79893 sll a7,a5,0x1c
103c4: 00485513 srl a0,a6,0x4
103c8: 00a8e533 or a0,a7,a0
103cc: 0047d893 srl a7,a5,0x4
103d0: 0117e7b3 or a5,a5,a7
103d4: 00a86833 or a6,a6,a0
103d8: 01879893 sll a7,a5,0x18
103dc: 00885513 srl a0,a6,0x8
103e0: 00a8e533 or a0,a7,a0
103e4: 0087d893 srl a7,a5,0x8
103e8: 0117e7b3 or a5,a5,a7
103ec: 00a86833 or a6,a6,a0
103f0: 01079893 sll a7,a5,0x10
103f4: 01085513 srl a0,a6,0x10
103f8: 00a8e533 or a0,a7,a0
103fc: 0107d893 srl a7,a5,0x10
10400: 0117e7b3 or a5,a5,a7
10404: 00a86833 or a6,a6,a0
10408: 00f86833 or a6,a6,a5
1040c: 01f79313 sll t1,a5,0x1f
10410: 00185513 srl a0,a6,0x1
10414: 555558b7 lui a7,0x55555
10418: 55588893 add a7,a7,1365 # 55555555 <__BSS_END__+0x555377d1>
1041c: 00a36533 or a0,t1,a0
10420: 01157533 and a0,a0,a7
10424: 0017d313 srl t1,a5,0x1
10428: 40a80533 sub a0,a6,a0
1042c: 011378b3 and a7,t1,a7
10430: 00a83833 sltu a6,a6,a0
10434: 411787b3 sub a5,a5,a7
10438: 410787b3 sub a5,a5,a6
1043c: 01e79313 sll t1,a5,0x1e
10440: 00255813 srl a6,a0,0x2
10444: 333338b7 lui a7,0x33333
10448: 33388893 add a7,a7,819 # 33333333 <__BSS_END__+0x333155af>
1044c: 01036833 or a6,t1,a6
10450: 01187833 and a6,a6,a7
10454: 0027d313 srl t1,a5,0x2
10458: 01157533 and a0,a0,a7
1045c: 00a80533 add a0,a6,a0
10460: 01137333 and t1,t1,a7
10464: 0117f7b3 and a5,a5,a7
10468: 01053833 sltu a6,a0,a6
1046c: 00f307b3 add a5,t1,a5
10470: 00f80833 add a6,a6,a5
10474: 01c81893 sll a7,a6,0x1c
10478: 00455793 srl a5,a0,0x4
1047c: 00f8e7b3 or a5,a7,a5
10480: 00a78533 add a0,a5,a0
10484: 00485893 srl a7,a6,0x4
10488: 010888b3 add a7,a7,a6
1048c: 00f537b3 sltu a5,a0,a5
10490: 0f0f1837 lui a6,0xf0f1
10494: f0f80813 add a6,a6,-241 # f0f0f0f <__BSS_END__+0xf0d318b>
10498: 011787b3 add a5,a5,a7
1049c: 0107f7b3 and a5,a5,a6
104a0: 01057533 and a0,a0,a6
104a4: 01879893 sll a7,a5,0x18
104a8: 00855813 srl a6,a0,0x8
104ac: 0108e833 or a6,a7,a6
104b0: 01050833 add a6,a0,a6
104b4: 0087d893 srl a7,a5,0x8
104b8: 011787b3 add a5,a5,a7
104bc: 00a83533 sltu a0,a6,a0
104c0: 00f50533 add a0,a0,a5
104c4: 01051893 sll a7,a0,0x10
104c8: 01085793 srl a5,a6,0x10
104cc: 00f8e7b3 or a5,a7,a5
104d0: 00f807b3 add a5,a6,a5
104d4: 01055893 srl a7,a0,0x10
104d8: 0107b833 sltu a6,a5,a6
104dc: 01150533 add a0,a0,a7
104e0: 00a80833 add a6,a6,a0
104e4: 010787b3 add a5,a5,a6
104e8: 07f7f513 and a0,a5,127
104ec: 00050793 mv a5,a0
104f0: 06050063 beqz a0,10550 <HammingDistance+0x1d4>
104f4: 00000513 li a0,0
104f8: fff78793 add a5,a5,-1
104fc: 00c74833 xor a6,a4,a2
10500: 01079893 sll a7,a5,0x10
10504: 01f59e13 sll t3,a1,0x1f
10508: 01f69313 sll t1,a3,0x1f
1050c: 00187813 and a6,a6,1
10510: 00175713 srl a4,a4,0x1
10514: 00165613 srl a2,a2,0x1
10518: 01079793 sll a5,a5,0x10
1051c: 0108d893 srl a7,a7,0x10
10520: 01050533 add a0,a0,a6
10524: 00ee6733 or a4,t3,a4
10528: 0015d593 srl a1,a1,0x1
1052c: 00c36633 or a2,t1,a2
10530: 0016d693 srl a3,a3,0x1
10534: 4107d793 sra a5,a5,0x10
10538: fc0890e3 bnez a7,104f8 <HammingDistance+0x17c>
1053c: 00008067 ret
10540: e4a678e3 bgeu a2,a0,10390 <HammingDistance+0x14>
10544: 00070893 mv a7,a4
10548: 00058813 mv a6,a1
1054c: e45ff06f j 10390 <HammingDistance+0x14>
10550: 00000513 li a0,0
10554: 00008067 ret
```
:::
#### Observation
* Register Usage
| Register | CLZ | Hammingdistnce |
|:--------:|:--------------:|:--------------:|
| a-type | 6 (a0-a5) | 8(a0-a7) |
| s-type | 0 | 0 |
| t-type | 0 | 2(t1, t3) |
| other | 0 | 0 |
* Stack Usage
| Stack | CLZ | Hammingdistance |
| ----- | --- |:---------------:|
| Usage |0 | 0 |
* Lines of code/`lw`/`sw`
| Line of code | lw | sw |
|:------------:| --- | --- |
| 214 | 0 | 0 |
Compare to -O2, line of code is longer because of loop unrolling.
### Os Optimized
Optimize for size
```c
$ ../../build/rv32emu Hammingdist.elf
Hamming Distance = 21
Hamming Distance = 63
Hamming Distance = 0
cycle count: 7400
instret: 2cb
inferior exit code 0
$ riscv-none-elf-readelf -h Hammingdist.elf
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x1018a
Start of program headers: 52 (bytes into file)
Start of section headers: 69564 (bytes into file)
Flags: 0x1, RVC, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
$ riscv-none-elf-size Hammingdist.elf
text data bss dec hex filename
51786 1928 1528 55242 d7ca Hammingdist.elf
```
:::spoiler **Disassemble**
```c
0001020c <count_leading_zeros>:
1020c: 01f59713 sll a4,a1,0x1f
10210: 00155793 srl a5,a0,0x1
10214: 00f767b3 or a5,a4,a5
10218: 0015d713 srl a4,a1,0x1
1021c: 00b765b3 or a1,a4,a1
10220: 00a7e533 or a0,a5,a0
10224: 01e59713 sll a4,a1,0x1e
10228: 00255793 srl a5,a0,0x2
1022c: 00f767b3 or a5,a4,a5
10230: 0025d613 srl a2,a1,0x2
10234: 00b66633 or a2,a2,a1
10238: 00a7e533 or a0,a5,a0
1023c: 01c61713 sll a4,a2,0x1c
10240: 00455793 srl a5,a0,0x4
10244: 00f767b3 or a5,a4,a5
10248: 00465693 srl a3,a2,0x4
1024c: 00a7e733 or a4,a5,a0
10250: 00c6e6b3 or a3,a3,a2
10254: 01869613 sll a2,a3,0x18
10258: 00875793 srl a5,a4,0x8
1025c: 00f667b3 or a5,a2,a5
10260: 0086d613 srl a2,a3,0x8
10264: 00d66633 or a2,a2,a3
10268: 00e7e7b3 or a5,a5,a4
1026c: 0107d693 srl a3,a5,0x10
10270: 01061713 sll a4,a2,0x10
10274: 00d766b3 or a3,a4,a3
10278: 01065713 srl a4,a2,0x10
1027c: 00c76733 or a4,a4,a2
10280: 00f6e6b3 or a3,a3,a5
10284: 00d766b3 or a3,a4,a3
10288: 01f71613 sll a2,a4,0x1f
1028c: 0016d793 srl a5,a3,0x1
10290: 00f667b3 or a5,a2,a5
10294: 55555637 lui a2,0x55555
10298: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x555377d1>
1029c: 00175593 srl a1,a4,0x1
102a0: 00c7f7b3 and a5,a5,a2
102a4: 40f687b3 sub a5,a3,a5
102a8: 00c5f633 and a2,a1,a2
102ac: 00f6b6b3 sltu a3,a3,a5
102b0: 40c70733 sub a4,a4,a2
102b4: 40d70733 sub a4,a4,a3
102b8: 01e71613 sll a2,a4,0x1e
102bc: 0027d693 srl a3,a5,0x2
102c0: 00d666b3 or a3,a2,a3
102c4: 33333637 lui a2,0x33333
102c8: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155af>
102cc: 00c6f6b3 and a3,a3,a2
102d0: 00275593 srl a1,a4,0x2
102d4: 00c7f7b3 and a5,a5,a2
102d8: 00c5f5b3 and a1,a1,a2
102dc: 00f687b3 add a5,a3,a5
102e0: 00c77733 and a4,a4,a2
102e4: 00e58733 add a4,a1,a4
102e8: 00d7b6b3 sltu a3,a5,a3
102ec: 00e686b3 add a3,a3,a4
102f0: 01c69613 sll a2,a3,0x1c
102f4: 0047d713 srl a4,a5,0x4
102f8: 00e66733 or a4,a2,a4
102fc: 00f707b3 add a5,a4,a5
10300: 0046d613 srl a2,a3,0x4
10304: 00d606b3 add a3,a2,a3
10308: 00e7b733 sltu a4,a5,a4
1030c: 00d70733 add a4,a4,a3
10310: 0f0f16b7 lui a3,0xf0f1
10314: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d318b>
10318: 00d77733 and a4,a4,a3
1031c: 00d7f7b3 and a5,a5,a3
10320: 01871613 sll a2,a4,0x18
10324: 0087d693 srl a3,a5,0x8
10328: 00d666b3 or a3,a2,a3
1032c: 00f687b3 add a5,a3,a5
10330: 00875613 srl a2,a4,0x8
10334: 00e60733 add a4,a2,a4
10338: 00d7b6b3 sltu a3,a5,a3
1033c: 00e686b3 add a3,a3,a4
10340: 01069613 sll a2,a3,0x10
10344: 0107d713 srl a4,a5,0x10
10348: 00e66733 or a4,a2,a4
1034c: 00f707b3 add a5,a4,a5
10350: 0106d613 srl a2,a3,0x10
10354: 00e7b733 sltu a4,a5,a4
10358: 00d606b3 add a3,a2,a3
1035c: 00d70733 add a4,a4,a3
10360: 00f70733 add a4,a4,a5
10364: 07f77713 and a4,a4,127
10368: 04000513 li a0,64
1036c: 40e50533 sub a0,a0,a4
10370: 01051513 sll a0,a0,0x10
10374: 01055513 srl a0,a0,0x10
10378: 00008067 ret
0001037c <HammingDistance>:
1037c: fe010113 add sp,sp,-32
10380: 00812c23 sw s0,24(sp)
10384: 00912a23 sw s1,20(sp)
10388: 01212823 sw s2,16(sp)
1038c: 01312623 sw s3,12(sp)
10390: 00112e23 sw ra,28(sp)
10394: 00050413 mv s0,a0
10398: 00058913 mv s2,a1
1039c: 00060493 mv s1,a2
103a0: 00068993 mv s3,a3
103a4: 00b6ea63 bltu a3,a1,103b8 <HammingDistance+0x3c>
103a8: 00060513 mv a0,a2
103ac: 00068593 mv a1,a3
103b0: 00d91863 bne s2,a3,103c0 <HammingDistance+0x44>
103b4: 00867663 bgeu a2,s0,103c0 <HammingDistance+0x44>
103b8: 00040513 mv a0,s0
103bc: 00090593 mv a1,s2
103c0: e4dff0ef jal 1020c <count_leading_zeros>
103c4: 04000793 li a5,64
103c8: 40a787b3 sub a5,a5,a0
103cc: 01079793 sll a5,a5,0x10
103d0: 4107d793 sra a5,a5,0x10
103d4: 00000513 li a0,0
103d8: 02f04063 bgtz a5,103f8 <HammingDistance+0x7c>
103dc: 01c12083 lw ra,28(sp)
103e0: 01812403 lw s0,24(sp)
103e4: 01412483 lw s1,20(sp)
103e8: 01012903 lw s2,16(sp)
103ec: 00c12983 lw s3,12(sp)
103f0: 02010113 add sp,sp,32
103f4: 00008067 ret
103f8: 00944733 xor a4,s0,s1
103fc: 00177713 and a4,a4,1
10400: 00070463 beqz a4,10408 <HammingDistance+0x8c>
10404: 00150513 add a0,a0,1
10408: 01f91713 sll a4,s2,0x1f
1040c: 00145413 srl s0,s0,0x1
10410: fff78793 add a5,a5,-1
10414: 00876433 or s0,a4,s0
10418: 0014d493 srl s1,s1,0x1
1041c: 01f99713 sll a4,s3,0x1f
10420: 01079793 sll a5,a5,0x10
10424: 00195913 srl s2,s2,0x1
10428: 009764b3 or s1,a4,s1
1042c: 0019d993 srl s3,s3,0x1
10430: 4107d793 sra a5,a5,0x10
10434: fa5ff06f j 103d8 <HammingDistance+0x5c>
```
:::
#### Observation
* Register Usage
| Register | CLZ | Hammingdistnce |
|:--------:|:--------------:|:--------------:|
| a-type | 6 (a0-a5) | 6(a0-a5) |
| s-type | 0 | 4(s0-s3) |
| t-type | 0 | 0 |
| other | 0 | ra, sp |
* Stack Usage
| Stack | CLZ | Hammingdistance |
| ----- | --- |:---------------:|
| Usage |0 | 32 |
* Lines of code/`lw`/`sw`
| Line of code | lw | sw |
|:------------:| --- | --- |
| 142 | 5 | 5 |
### `Ofast`
Optimized for speed
```c
$ ../../build/rv32emu Hammingdist.elf
Hamming Distance = 21
Hamming Distance = 63
Hamming Distance = 0
cycle count: 7363
instret: 2cb
inferior exit code 0
$ riscv-none-elf-readelf -h Hammingdist.elf
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x102ee
Start of program headers: 52 (bytes into file)
Start of section headers: 69660 (bytes into file)
Flags: 0x1, RVC, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
$ riscv-none-elf-size Hammingdist.elf
text data bss dec hex filename
52074 1928 1528 55530 d8ea Hammingdist.elf
```
:::spoiler **Disassemble**
```c
0001020c <count_leading_zeros>:
1020c: 01f59713 sll a4,a1,0x1f
10210: 00155793 srl a5,a0,0x1
10214: 00f767b3 or a5,a4,a5
10218: 0015d713 srl a4,a1,0x1
1021c: 00b765b3 or a1,a4,a1
10220: 00a7e533 or a0,a5,a0
10224: 01e59713 sll a4,a1,0x1e
10228: 00255793 srl a5,a0,0x2
1022c: 00f767b3 or a5,a4,a5
10230: 0025d713 srl a4,a1,0x2
10234: 00b76733 or a4,a4,a1
10238: 00a7e533 or a0,a5,a0
1023c: 01c71693 sll a3,a4,0x1c
10240: 00455793 srl a5,a0,0x4
10244: 00f6e7b3 or a5,a3,a5
10248: 00475693 srl a3,a4,0x4
1024c: 00e6e6b3 or a3,a3,a4
10250: 00a7e733 or a4,a5,a0
10254: 01869613 sll a2,a3,0x18
10258: 00875793 srl a5,a4,0x8
1025c: 00f667b3 or a5,a2,a5
10260: 0086d613 srl a2,a3,0x8
10264: 00d66633 or a2,a2,a3
10268: 00e7e7b3 or a5,a5,a4
1026c: 0107d693 srl a3,a5,0x10
10270: 01061713 sll a4,a2,0x10
10274: 00d766b3 or a3,a4,a3
10278: 01065713 srl a4,a2,0x10
1027c: 00c76733 or a4,a4,a2
10280: 00f6e6b3 or a3,a3,a5
10284: 00d766b3 or a3,a4,a3
10288: 01f71593 sll a1,a4,0x1f
1028c: 0016d793 srl a5,a3,0x1
10290: 55555637 lui a2,0x55555
10294: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x555377d1>
10298: 00f5e7b3 or a5,a1,a5
1029c: 00c7f7b3 and a5,a5,a2
102a0: 00175593 srl a1,a4,0x1
102a4: 40f687b3 sub a5,a3,a5
102a8: 00c5f633 and a2,a1,a2
102ac: 00f6b6b3 sltu a3,a3,a5
102b0: 40c70733 sub a4,a4,a2
102b4: 40d70733 sub a4,a4,a3
102b8: 01e71593 sll a1,a4,0x1e
102bc: 0027d693 srl a3,a5,0x2
102c0: 33333637 lui a2,0x33333
102c4: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155af>
102c8: 00d5e6b3 or a3,a1,a3
102cc: 00c6f6b3 and a3,a3,a2
102d0: 00275593 srl a1,a4,0x2
102d4: 00c7f7b3 and a5,a5,a2
102d8: 00f687b3 add a5,a3,a5
102dc: 00c5f5b3 and a1,a1,a2
102e0: 00c77733 and a4,a4,a2
102e4: 00e58733 add a4,a1,a4
102e8: 00d7b6b3 sltu a3,a5,a3
102ec: 00e686b3 add a3,a3,a4
102f0: 01c69613 sll a2,a3,0x1c
102f4: 0047d713 srl a4,a5,0x4
102f8: 00e66733 or a4,a2,a4
102fc: 00f707b3 add a5,a4,a5
10300: 0046d613 srl a2,a3,0x4
10304: 00d60633 add a2,a2,a3
10308: 00e7b733 sltu a4,a5,a4
1030c: 0f0f16b7 lui a3,0xf0f1
10310: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d318b>
10314: 00c70733 add a4,a4,a2
10318: 00d77733 and a4,a4,a3
1031c: 00d7f7b3 and a5,a5,a3
10320: 01871613 sll a2,a4,0x18
10324: 0087d693 srl a3,a5,0x8
10328: 00d666b3 or a3,a2,a3
1032c: 00f687b3 add a5,a3,a5
10330: 00875613 srl a2,a4,0x8
10334: 00e60733 add a4,a2,a4
10338: 00d7b6b3 sltu a3,a5,a3
1033c: 00e686b3 add a3,a3,a4
10340: 01069613 sll a2,a3,0x10
10344: 0107d713 srl a4,a5,0x10
10348: 00e66733 or a4,a2,a4
1034c: 00f707b3 add a5,a4,a5
10350: 0106d613 srl a2,a3,0x10
10354: 00e7b733 sltu a4,a5,a4
10358: 00d606b3 add a3,a2,a3
1035c: 00d70733 add a4,a4,a3
10360: 00f70733 add a4,a4,a5
10364: 07f77713 and a4,a4,127
10368: 04000513 li a0,64
1036c: 40e50533 sub a0,a0,a4
10370: 01051513 sll a0,a0,0x10
10374: 01055513 srl a0,a0,0x10
10378: 00008067 ret
0001037c <HammingDistance>:
1037c: 00050713 mv a4,a0
10380: 00060893 mv a7,a2
10384: 00068813 mv a6,a3
10388: 1ab6ee63 bltu a3,a1,10544 <HammingDistance+0x1c8>
1038c: 1ad58a63 beq a1,a3,10540 <HammingDistance+0x1c4>
10390: 01f81793 sll a5,a6,0x1f
10394: 0018d513 srl a0,a7,0x1
10398: 00a7e533 or a0,a5,a0
1039c: 00185793 srl a5,a6,0x1
103a0: 0107e7b3 or a5,a5,a6
103a4: 01156833 or a6,a0,a7
103a8: 00285513 srl a0,a6,0x2
103ac: 01e79893 sll a7,a5,0x1e
103b0: 00a8e533 or a0,a7,a0
103b4: 0027d893 srl a7,a5,0x2
103b8: 0117e7b3 or a5,a5,a7
103bc: 00a86833 or a6,a6,a0
103c0: 01c79893 sll a7,a5,0x1c
103c4: 00485513 srl a0,a6,0x4
103c8: 00a8e533 or a0,a7,a0
103cc: 0047d893 srl a7,a5,0x4
103d0: 0117e7b3 or a5,a5,a7
103d4: 00a86833 or a6,a6,a0
103d8: 01879893 sll a7,a5,0x18
103dc: 00885513 srl a0,a6,0x8
103e0: 00a8e533 or a0,a7,a0
103e4: 0087d893 srl a7,a5,0x8
103e8: 0117e7b3 or a5,a5,a7
103ec: 00a86833 or a6,a6,a0
103f0: 01079893 sll a7,a5,0x10
103f4: 01085513 srl a0,a6,0x10
103f8: 00a8e533 or a0,a7,a0
103fc: 0107d893 srl a7,a5,0x10
10400: 0117e7b3 or a5,a5,a7
10404: 00a86833 or a6,a6,a0
10408: 00f86833 or a6,a6,a5
1040c: 01f79313 sll t1,a5,0x1f
10410: 00185513 srl a0,a6,0x1
10414: 555558b7 lui a7,0x55555
10418: 55588893 add a7,a7,1365 # 55555555 <__BSS_END__+0x555377d1>
1041c: 00a36533 or a0,t1,a0
10420: 01157533 and a0,a0,a7
10424: 0017d313 srl t1,a5,0x1
10428: 40a80533 sub a0,a6,a0
1042c: 011378b3 and a7,t1,a7
10430: 00a83833 sltu a6,a6,a0
10434: 411787b3 sub a5,a5,a7
10438: 410787b3 sub a5,a5,a6
1043c: 01e79313 sll t1,a5,0x1e
10440: 00255813 srl a6,a0,0x2
10444: 333338b7 lui a7,0x33333
10448: 33388893 add a7,a7,819 # 33333333 <__BSS_END__+0x333155af>
1044c: 01036833 or a6,t1,a6
10450: 01187833 and a6,a6,a7
10454: 0027d313 srl t1,a5,0x2
10458: 01157533 and a0,a0,a7
1045c: 00a80533 add a0,a6,a0
10460: 01137333 and t1,t1,a7
10464: 0117f7b3 and a5,a5,a7
10468: 01053833 sltu a6,a0,a6
1046c: 00f307b3 add a5,t1,a5
10470: 00f80833 add a6,a6,a5
10474: 01c81893 sll a7,a6,0x1c
10478: 00455793 srl a5,a0,0x4
1047c: 00f8e7b3 or a5,a7,a5
10480: 00a78533 add a0,a5,a0
10484: 00485893 srl a7,a6,0x4
10488: 010888b3 add a7,a7,a6
1048c: 00f537b3 sltu a5,a0,a5
10490: 0f0f1837 lui a6,0xf0f1
10494: f0f80813 add a6,a6,-241 # f0f0f0f <__BSS_END__+0xf0d318b>
10498: 011787b3 add a5,a5,a7
1049c: 0107f7b3 and a5,a5,a6
104a0: 01057533 and a0,a0,a6
104a4: 01879893 sll a7,a5,0x18
104a8: 00855813 srl a6,a0,0x8
104ac: 0108e833 or a6,a7,a6
104b0: 01050833 add a6,a0,a6
104b4: 0087d893 srl a7,a5,0x8
104b8: 011787b3 add a5,a5,a7
104bc: 00a83533 sltu a0,a6,a0
104c0: 00f50533 add a0,a0,a5
104c4: 01051893 sll a7,a0,0x10
104c8: 01085793 srl a5,a6,0x10
104cc: 00f8e7b3 or a5,a7,a5
104d0: 00f807b3 add a5,a6,a5
104d4: 01055893 srl a7,a0,0x10
104d8: 0107b833 sltu a6,a5,a6
104dc: 01150533 add a0,a0,a7
104e0: 00a80833 add a6,a6,a0
104e4: 010787b3 add a5,a5,a6
104e8: 07f7f513 and a0,a5,127
104ec: 00050793 mv a5,a0
104f0: 06050063 beqz a0,10550 <HammingDistance+0x1d4>
104f4: 00000513 li a0,0
104f8: fff78793 add a5,a5,-1
104fc: 00c74833 xor a6,a4,a2
10500: 01079893 sll a7,a5,0x10
10504: 01f59e13 sll t3,a1,0x1f
10508: 01f69313 sll t1,a3,0x1f
1050c: 00187813 and a6,a6,1
10510: 00175713 srl a4,a4,0x1
10514: 00165613 srl a2,a2,0x1
10518: 01079793 sll a5,a5,0x10
1051c: 0108d893 srl a7,a7,0x10
10520: 01050533 add a0,a0,a6
10524: 00ee6733 or a4,t3,a4
10528: 0015d593 srl a1,a1,0x1
1052c: 00c36633 or a2,t1,a2
10530: 0016d693 srl a3,a3,0x1
10534: 4107d793 sra a5,a5,0x10
10538: fc0890e3 bnez a7,104f8 <HammingDistance+0x17c>
1053c: 00008067 ret
10540: e4a678e3 bgeu a2,a0,10390 <HammingDistance+0x14>
10544: 00070893 mv a7,a4
10548: 00058813 mv a6,a1
1054c: e45ff06f j 10390 <HammingDistance+0x14>
10550: 00000513 li a0,0
10554: 00008067 ret
```
:::
#### Observation
* Register Usage
| Register | CLZ | Hammingdistnce |
|:--------:|:--------------:|:--------------:|
| a-type | 6 (a0-a5) | 8(a0-a7) |
| s-type | 0 | 4(s0-s3) |
| t-type | 0 | 2(t1, t3) |
| other | 0 | 0 |
* Stack Usage
| Stack | CLZ | Hammingdistance |
| ----- | --- |:---------------:|
| Usage |0 | 0 |
* Lines of code/`lw`/`sw`
| Line of code | lw | sw |
|:------------:| --- | --- |
| 214 | 0 | 0 |
### Conclusion
| Optimaztion level | O0 | O1 | O2 | O3 | Os | Ofast |
|:-----------------:| ----- | ----- | ----- | ----- | ----- | ----- |
| cycle counts | 11190 | 7198 | 7435 | 7363 | 7400 | 7363 |
| dec(from header) | 56362 | 55246 | 55250 | 55530 | 55242 | 55530 |
* **From -O0 to -O1** : Not only the usage `lw` instructions but also the registers are decreased significantly.Besides, the lines of code has been reduced by over half.
* **From -O1 to -O2** : The results are nearly identical, but it may be due to some flags not being suitable for the code, resulting in an increase in the number of cycles.
* **From -O2 to -O3 and O-fast** : Optimization has reached its limit. Utilizing the `t-type` register instead of the `s-type` register, and both stack usage and `lw`/`sw` instructions are zero.
* **From -O1 to -Os** : -Os is Optimize for size. `dec` is the summition of all three segments, `dec = .text + .data + .bss`.We can see that -Os optimization has the least amount of `dec`.
## Handwritten Assembly Code
### Original assembly code from [author_YP](https://hackmd.io/@O6C2C3zQQBanDM55QRZ7DQ)
:::spoiler Original assembly code
```c
.data
test_data_1: .dword 0x0000000000100000, 0x00000000000FFFFF # HD(1048576, 1048575) = 21
test_data_2: .dword 0x0000000000000001, 0x7FFFFFFFFFFFFFFE # HD(1, 9223372036854775806) = 63
test_data_3: .dword 0x000000028370228F, 0x000000028370228F # HD(10795098767, 10795098767) = 0
msg_string: .string "\nHamming Distance="
.text
main:
addi sp, sp, -12
# push pointers of test data onto the stack
la t0, test_data_1
sw t0, 0(sp)
la t0, test_data_2
sw t0, 4(sp)
la t0, test_data_3
sw t0, 8(sp)
# initialize main_loop
addi s0, zero, 3 # s0 : number of test case
addi s1, zero, 0 # s1 : test case counter
addi s2, sp, 0 # s2 : points to test_data_1
main_loop:
la a0, msg_string
li a7, 4 # print string
ecall
lw a0, 0(s2) # a0 : pointer to the first data in test_data_1
addi a1, a0, 8 # a1 : pointer to the second data in test_data_1
jal ra, hd_func
# print the result #
li a7, 1 # print integer
ecall # print result of hd_cal (which is in a0)
addi s2, s2, 4 # s2 : points to next test_data
addi s1, s1, 1 # counter++
bne s1, s0, main_loop
addi sp, sp, 12
li a7, 10
ecall
# hamming distance function
hd_func:
addi sp, sp, -36
sw ra, 0(sp)
sw s0, 4(sp) # address of x0
sw s1, 8(sp) # address of x1
sw s2, 12(sp) # digit of x0
sw s3, 16(sp) # digit of x1
sw s4, 20(sp) # lower part of x0
sw s5, 24(sp) # higher part of x0
sw s6, 28(sp) # lower part of x1
sw s7, 32(sp) # higher part of x1
# get address of x0 and x1
mv s0, a0 # s0 : address of x0
mv s1, a1 # s1 : address of x1
# get x0_digit
lw a0, 0(s0) # a0 : lower part of x0
lw a1, 4(s0) # a1 : higher part of x0
jal ra clz
li s2, 64
sub s2, s2, a0 # s2 : x0_digit (return value saved in a0)
# get x1_digit
lw a0, 0(s1) # a0 : lower part of x1
lw a1, 4(s1) # a1 : higher part of x1
jal ra clz
li s3, 64
sub s3, s3, a0 # s3 : x1_digit (return value saved in a0)
# get x0(s5 s4) and x1(s7 s6)
lw s4, 0(s0)
lw s5, 4(s0)
lw s6, 0(s1)
lw s7, 4(s1)
# compare with two digit
slt t0, s2, s3
bne t0, zero, x1_larger
mv s3, zero # s3: hd counter
bgt s2, zero, hd_cal_loop
# when digit is 0
mv a0, s2 # save max_digit to a0
j hd_func_end
x1_larger:
mv s2, s3 # s2 : max_digit
mv s3, zero # s3: hd counter
bgt s2, zero, hd_cal_loop
# when digit is 0
mv a0, s2 # save max_digit to a0
j hd_func_end
hd_func_end:
lw ra, 0(sp)
lw s0, 4(sp)
lw s1, 8(sp)
lw s2, 12(sp)
lw s3, 16(sp)
lw s4, 20(sp)
lw s5, 24(sp)
lw s6, 28(sp)
lw s7, 32(sp)
addi sp, sp, 36
ret
# hamming distance calculation (result save in a0, a1)
hd_cal_loop:
# when the current digit larger than 32
addi t2, zero, 32
bgt s2, t2, hd_getLSB_upper
# hd_getLSB_lower : and with 1
li t3, 0x00000001
and t4, s4, t3
and t5, s6, t3
j hd_cal_shift
hd_getLSB_upper:
# and with 1
li t3, 0x00000001
and t4, s5, t3
and t5, s7, t3
hd_cal_shift:
# (s5 s4) = x >> 1
srli t0, s4, 1
slli t1, s5, 31
or s4, t0, t1 # s4 >> 1
srli s5, s5, 1 # s5 >> 1
# (s7 s6) = x >> 1
srli t0, s6, 1
slli t1, s7, 31
or s6, t0, t1 # s6 >> 1
srli s7, s7, 1 # s7 >> 1
beq t4, t5, hd_check_loop
addi s3, s3, 1
hd_check_loop:
addi s2, s2, -1
bne s2, zero, hd_cal_loop
mv a0, s3 # save return value to a0
j hd_func_end
# count leading zeros
clz:
addi sp, sp, -4
sw ra, 0(sp)
beq a1, zero, clz_lower_set_one
clz_upper_set_one:
srli t1, a1, 1
or a1, a1, t1
srli t1, a1, 2
or a1, a1, t1
srli t1, a1, 4
or a1, a1, t1
srli t1, a1, 8
or a1, a1, t1
srli t1, a1, 16
or a1, a1, t1
li a0, 0xffffffff
j clz_count_ones
clz_lower_set_one:
srli t0, a0, 1
or a0, a0, t0
srli t0, a0, 2
or a0, a0, t0
srli t0, a0, 4
or a0, a0, t0
srli t0, a0, 8
or a0, a0, t0
srli t0, a0, 16
or a0, a0, t0
clz_count_ones:
# x = (a1 a0)
# x -= ((x >> 1) & 0x5555555555555555); #
srli t0, a0, 1
slli t1, a1, 31
or t0, t0, t1 # t0 >> 1
srli t1, a1, 1 # t1 >> 1
li t2, 0x55555555
and t0, t0, t2
and t1, t1, t2
sltu t3, a0, t0 # t3 : borrow bit
sub a0, a0, t0
sub a1, a1, t1
sub a1, a1, t3
# x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333); #
srli t0, a0, 2
slli t1, a1, 30
or t0, t0, t1 # t0 >> 2
srli t1, a1, 2 # t1 >> 2
li t2, 0x33333333
and t0, t0, t2
and t1, t1, t2
and t4, a0, t2
and t5, a1, t2
# (a1 a0) = (t1 t0) + (t5 t4)
add a0, t0, t4
sltu t3, a0, t0 # t3 : carry bit
add a1, t1, t5
add a1, a1, t3
# x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f; #
srli t0, a0, 4
slli t1, a1, 28
or t0, t0, t1 # t0 >> 4
srli t1, a1, 4 # t1 >> 4
add t0, t0, a0
sltu t3, t0, a0 # t3 : carry bit
add t1, t1, a1
add t1, t1, t3
li t2, 0x0f0f0f0f
and a0, t0, t2
and a1, t1, t2
# x += (x >> 8); #
srli t0, a0, 8
slli t1, a1, 24
or t0, t0, t1 # t0 >> 8
srli t1, a1, 8 # t1 >> 8
add a0, a0, t0
sltu t3, a0, t0 # t3 : carry bit
add a1, a1, t1
add a1, a1, t3 # (a1 a0) += (t1 t0)
# x += (x >> 16); #
srli t0, a0, 16
slli t1, a1, 16
or t0, t0, t1 # t0 >> 16
srli t1, a1, 16 # t1 >> 16
add a0, a0, t0
sltu t3, a0, t0 # t3 : carry bit
add a1, a1, t1
add a1, a1, t3 # (a1 a0) += (t1 t0)
# x += (x >> 32); #
# (t1 t0) = x >> 32
mv t0, a1
mv t1, zero
add a0, a0, t0
sltu t3, a0, t0 # t3 : carry bit
add a1, a1, t1
add a1, a1, t3 # (a1 a0) += (t1 t0)
# return (64 - (x & 0x7f));
# a0 = (x & 0x7f)
andi a0, a0, 0x7f
li t0, 64
sub a0, t0, a0 # a0 = (64 - (x & 0x7f))
lw ra, 0(sp)
addi sp, sp, 4
ret
```
:::
### Modify assembly code
We need to make slight modifications to the assembly code to make it compliant with the rv32emu specifications.(Please check [docs/syscall.md](https://github.com/sysprog21/rv32emu/blob/master/docs/syscall.md))
1. **System call**
If we want print string or exit in Ripes, we may use:
```c1!
.data
msg_string: .string "Hello world !"
main:
la a0, msg_string
li a7, 4 # print string
ecall
li a7, 10 # exit
ecall
```
But in rv32emu, we should modify it to:
```c1!
.set STDOUT, 1
.set WRITE, 64
.set EXIT, 93
.section .rodata
msg: .ascii "Hello world !"
_start:
li a0, STDOUT # file descriptor, 1 = stnadard output
la a1, msg # address of string
li a2, 7 # length of string
li a7, WRITE # syscall number for write
ecall
# MISSING: Check for error condition
li a0, 0 # 0 signals success
li a7, EXIT
ecall
```
2. **Entry label**
In rv32emu, we can determine where to start the program through .ld file. So we need to modify the entry label of assembly code according to .ld file. Our ld. file is shown as :
```c1!
OUTPUT_ARCH("riscv")
ENTRY(_start1)
SECTIONS
{
. = 0x0;
}
```
It means that _start1 is entry label, so we need to modify our assembly code as following :
```c1!
.set EXIT, 93
_start1: #instead of main
addi t0, t0, 1
Exit:
li a0, 0 # 0 signals success
li a7, EXIT
ecall
```
If we didn't obey this rule, you may get the error message when you compiling your file :
```c1!
riscv-none-elf-ld: warning: cannot find entry symbol _start; defaulting to 00000000
```
### Additional adjustments
1. `get_cycles` : In order to count the cycles in the program, I add the `get_cycles` which is shown above.
2. `syscall_printint`: Because rv32emu doesn't support printing integer, I edit system call and add an `printint` syscall.
```diff
#define SUPPORTED_SYSCALLS \
+ _(printint, 11) \
...
_(submit_queue, 0xFEED) \
_(setup_audio, 0xBABE) \
_(control_audio, 0xD00D), \
)
+ static void syscall_printint(riscv_t *rv)
+ {
+ uint32_t buffer = rv_get_reg(rv, rv_reg_a1);
+ fprintf(stdout, " %d ", buffer);
+ }
...
```
### Assembly Code(Modified)
:::spoiler Assembly Code(Modified)
```c
.global _start1
#newlib system calls
.set SYSEXIT, 93
.set SYSWRITE, 64
.set SYSINT, 11
.data
test_data_1: .dword 0x0000000000100000, 0x00000000000FFFFF # HD(1048576, 1048575) = 21
test_data_2: .dword 0x0000000000000001, 0x7FFFFFFFFFFFFFFE # HD(1, 9223372036854775806) = 63
test_data_3: .dword 0x000000028370228F, 0x000000028370228F # HD(10795098767, 10795098767) = 0
.section .rodata
msg1: .ascii "\nHamming Distance="
msg2: .ascii "\nCycle counts :"
.text
_start1:
addi sp, sp, -12
# push pointers of test data onto the stack
la t0, test_data_1
sw t0, 0(sp)
la t0, test_data_2
sw t0, 4(sp)
la t0, test_data_3
sw t0, 8(sp)
# initialize main_loop
addi s0, zero, 3 # s0 : number of test case
addi s1, zero, 0 # s1 : test case counter
addi s2, sp, 0 # s2 : points to test_data_1
get_cycles_init:
csrr t1, cycleh
csrr s10, cycle
csrr t2, cycleh
bne t1, t2, get_cycles_init
main_loop:
li a7, SYSWRITE #"write" syscall
li a0, 1 #1 = standard output (stdout)
la a1, msg1 # load address of msg string
li a2, 18 # length of msg string
ecall
lw a0, 0(s2) # a0 : pointer to the first data in test_data_1
addi a1, a0, 8 # a1 : pointer to the second data in test_data_1
jal ra, hd_func
# print the result #
li a7, SYSINT #"printint" syscall
add a1, a0, x0 # address of string(move result of hd_cal to a1)
li a0, 1 #1 = standard output (stdout)
ecall # print result of hd_cal
addi s2, s2, 4 # s2 : points to next test_data
addi s1, s1, 1 # counter++
bne s1, s0, main_loop
get_cycles_end:
csrr t1, cycleh
csrr s11, cycle
csrr t2, cycleh
bne t1, t2, get_cycles_end
sub s11, s11, s10
# print the result #
li a7, SYSWRITE #"write" syscall
li a0, 1 #1 = standard output (stdout)
la a1, msg2 # load address of msg string
li a2, 15 # length of msg string
ecall
li a7, SYSINT #"printint" syscall
add a1, s11, x0 # address of string(move result of hd_cal to a1)
li a0, 1 #1 = standard output (stdout)
ecall # print result of get_cycles_end
addi sp, sp, 12
li a0, 0 #0 signals success
li a7, SYSEXIT # "exit" syscall
ecall
# hamming distance function
hd_func:
addi sp, sp, -36
sw ra, 0(sp)
sw s0, 4(sp) # address of x0
sw s1, 8(sp) # address of x1
sw s2, 12(sp) # digit of x0
sw s3, 16(sp) # digit of x1
sw s4, 20(sp) # lower part of x0
sw s5, 24(sp) # higher part of x0
sw s6, 28(sp) # lower part of x1
sw s7, 32(sp) # higher part of x1
# get address of x0 and x1
mv s0, a0 # s0 : address of x0
mv s1, a1 # s1 : address of x1
# get x0_digit
lw a0, 0(s0) # a0 : lower part of x0
lw a1, 4(s0) # a1 : higher part of x0
jal ra, clz
li s2, 64
sub s2, s2, a0 # s2 : x0_digit (return value saved in a0)
# get x1_digit
lw a0, 0(s1) # a0 : lower part of x1
lw a1, 4(s1) # a1 : higher part of x1
jal ra, clz
li s3, 64
sub s3, s3, a0 # s3 : x1_digit (return value saved in a0)
# get x0(s5 s4) and x1(s7 s6)
lw s4, 0(s0)
lw s5, 4(s0)
lw s6, 0(s1)
lw s7, 4(s1)
# compare with two digit
slt t0, s2, s3
bne t0, zero, x1_larger
mv s3, zero # s3: hd counter
bgt s2, zero, hd_cal_loop
# when digit is 0
mv a0, s2 # save max_digit to a0
j hd_func_end
x1_larger:
mv s2, s3 # s2 : max_digit
mv s3, zero # s3: hd counter
bgt s2, zero, hd_cal_loop
# when digit is 0
mv a0, s2 # save max_digit to a0
j hd_func_end
hd_func_end:
lw ra, 0(sp)
lw s0, 4(sp)
lw s1, 8(sp)
lw s2, 12(sp)
lw s3, 16(sp)
lw s4, 20(sp)
lw s5, 24(sp)
lw s6, 28(sp)
lw s7, 32(sp)
addi sp, sp, 36
ret
# hamming distance calculation (result save in a0, a1)
hd_cal_loop:
# when the current digit larger than 32
addi t2, zero, 32
bgt s2, t2, hd_getLSB_upper
# hd_getLSB_lower : and with 1
li t3, 0x00000001
and t4, s4, t3
and t5, s6, t3
j hd_cal_shift
hd_getLSB_upper:
# and with 1
li t3, 0x00000001
and t4, s5, t3
and t5, s7, t3
hd_cal_shift:
# (s5 s4) = x >> 1
srli t0, s4, 1
slli t1, s5, 31
or s4, t0, t1 # s4 >> 1
srli s5, s5, 1 # s5 >> 1
# (s7 s6) = x >> 1
srli t0, s6, 1
slli t1, s7, 31
or s6, t0, t1 # s6 >> 1
srli s7, s7, 1 # s7 >> 1
beq t4, t5, hd_check_loop
addi s3, s3, 1
hd_check_loop:
addi s2, s2, -1
bne s2, zero, hd_cal_loop
mv a0, s3 # save return value to a0
j hd_func_end
# count leading zeros
clz:
addi sp, sp, -4
sw ra, 0(sp)
beq a1, zero, clz_lower_set_one
clz_upper_set_one:
srli t1, a1, 1
or a1, a1, t1
srli t1, a1, 2
or a1, a1, t1
srli t1, a1, 4
or a1, a1, t1
srli t1, a1, 8
or a1, a1, t1
srli t1, a1, 16
or a1, a1, t1
li a0, 0xffffffff
j clz_count_ones
clz_lower_set_one:
srli t0, a0, 1
or a0, a0, t0
srli t0, a0, 2
or a0, a0, t0
srli t0, a0, 4
or a0, a0, t0
srli t0, a0, 8
or a0, a0, t0
srli t0, a0, 16
or a0, a0, t0
clz_count_ones:
# x = (a1 a0)
# x -= ((x >> 1) & 0x5555555555555555); #
srli t0, a0, 1
slli t1, a1, 31
or t0, t0, t1 # t0 >> 1
srli t1, a1, 1 # t1 >> 1
li t2, 0x55555555
and t0, t0, t2
and t1, t1, t2
sltu t3, a0, t0 # t3 : borrow bit
sub a0, a0, t0
sub a1, a1, t1
sub a1, a1, t3
# x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333); #
srli t0, a0, 2
slli t1, a1, 30
or t0, t0, t1 # t0 >> 2
srli t1, a1, 2 # t1 >> 2
li t2, 0x33333333
and t0, t0, t2
and t1, t1, t2
and t4, a0, t2
and t5, a1, t2
# (a1 a0) = (t1 t0) + (t5 t4)
add a0, t0, t4
sltu t3, a0, t0 # t3 : carry bit
add a1, t1, t5
add a1, a1, t3
# x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f; #
srli t0, a0, 4
slli t1, a1, 28
or t0, t0, t1 # t0 >> 4
srli t1, a1, 4 # t1 >> 4
add t0, t0, a0
sltu t3, t0, a0 # t3 : carry bit
add t1, t1, a1
add t1, t1, t3
li t2, 0x0f0f0f0f
and a0, t0, t2
and a1, t1, t2
# x += (x >> 8); #
srli t0, a0, 8
slli t1, a1, 24
or t0, t0, t1 # t0 >> 8
srli t1, a1, 8 # t1 >> 8
add a0, a0, t0
sltu t3, a0, t0 # t3 : carry bit
add a1, a1, t1
add a1, a1, t3 # (a1 a0) += (t1 t0)
# x += (x >> 16); #
srli t0, a0, 16
slli t1, a1, 16
or t0, t0, t1 # t0 >> 16
srli t1, a1, 16 # t1 >> 16
add a0, a0, t0
sltu t3, a0, t0 # t3 : carry bit
add a1, a1, t1
add a1, a1, t3 # (a1 a0) += (t1 t0)
# x += (x >> 32); #
# (t1 t0) = x >> 32
mv t0, a1
mv t1, zero
add a0, a0, t0
sltu t3, a0, t0 # t3 : carry bit
add a1, a1, t1
add a1, a1, t3 # (a1 a0) += (t1 t0)
# return (64 - (x & 0x7f));
# a0 = (x & 0x7f)
andi a0, a0, 0x7f
li t0, 64
sub a0, t0, a0 # a0 = (64 - (x & 0x7f))
lw ra, 0(sp)
addi sp, sp, 4
ret
```
:::
### Makefile
To facilitate compilation, we can write a Makefile:
```c
.PHONY: clean
include ../../mk/toolchain.mk
ASFLAGS = -march=rv32i_zicsr -mabi=ilp32
LDFLAGS = --oformat=elf32-littleriscv
BIN = Hammingdist.elf
%.o: %.S
$(CROSS_COMPILE)as -R $(ASFLAGS) -c -o $@ $<
all: $(BIN)
Hammingdist.elf: Hammingdist.o
$(CROSS_COMPILE)ld -o $@ -T Hammingdist.ld $(LDFLAGS) $<
clean:
$(RM) $(BIN) Hammingdist.o
```
* Because we use the `CSR` command, we need to add `_zicsr` in `ASFLAGS` additionally.
### Output
```c
$ ../../build/rv32emu Hammingdist.elf
Hamming Distance= 21
Hamming Distance= 63
Hamming Distance= 0
Cycle counts : 2729
inferior exit code 0
$ riscv-none-elf-readelf -h Hammingdist.elf
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x0
Start of program headers: 52 (bytes into file)
Start of section headers: 5900 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 2
Size of section headers: 40 (bytes)
Number of section headers: 7
Section header string table index: 6
$ riscv-none-elf-size Hammingdist.elf
text data bss dec hex filename
929 0 0 929 3a1 Hammingdist.elf
```
:::spoiler **Disassemble**
```c
000000c8 <hd_func>:
c8: fdc10113 add sp,sp,-36
cc: 00112023 sw ra,0(sp)
d0: 00812223 sw s0,4(sp)
d4: 00912423 sw s1,8(sp)
d8: 01212623 sw s2,12(sp)
dc: 01312823 sw s3,16(sp)
e0: 01412a23 sw s4,20(sp)
e4: 01512c23 sw s5,24(sp)
e8: 01612e23 sw s6,28(sp)
ec: 03712023 sw s7,32(sp)
f0: 00050413 mv s0,a0
f4: 00058493 mv s1,a1
f8: 00042503 lw a0,0(s0)
fc: 00442583 lw a1,4(s0)
100: 0e4000ef jal 1e4 <clz>
104: 04000913 li s2,64
108: 40a90933 sub s2,s2,a0
10c: 0004a503 lw a0,0(s1)
110: 0044a583 lw a1,4(s1)
114: 0d0000ef jal 1e4 <clz>
118: 04000993 li s3,64
11c: 40a989b3 sub s3,s3,a0
120: 00042a03 lw s4,0(s0)
124: 00442a83 lw s5,4(s0)
128: 0004ab03 lw s6,0(s1)
12c: 0044ab83 lw s7,4(s1)
130: 013922b3 slt t0,s2,s3
134: 00029a63 bnez t0,148 <x1_larger>
138: 00000993 li s3,0
13c: 05204663 bgtz s2,188 <hd_cal_loop>
140: 00090513 mv a0,s2
144: 0180006f j 15c <hd_func_end>
00000148 <x1_larger>:
148: 00098913 mv s2,s3
14c: 00000993 li s3,0
150: 03204c63 bgtz s2,188 <hd_cal_loop>
154: 00090513 mv a0,s2
158: 0040006f j 15c <hd_func_end>
0000015c <hd_func_end>:
15c: 00012083 lw ra,0(sp)
160: 00412403 lw s0,4(sp)
164: 00812483 lw s1,8(sp)
168: 00c12903 lw s2,12(sp)
16c: 01012983 lw s3,16(sp)
170: 01412a03 lw s4,20(sp)
174: 01812a83 lw s5,24(sp)
178: 01c12b03 lw s6,28(sp)
17c: 02012b83 lw s7,32(sp)
180: 02410113 add sp,sp,36
184: 00008067 ret
00000188 <hd_cal_loop>:
188: 02000393 li t2,32
18c: 0123ca63 blt t2,s2,1a0 <hd_getLSB_upper>
190: 00100e13 li t3,1
194: 01ca7eb3 and t4,s4,t3
198: 01cb7f33 and t5,s6,t3
19c: 0100006f j 1ac <hd_cal_shift>
000001a0 <hd_getLSB_upper>:
1a0: 00100e13 li t3,1
1a4: 01cafeb3 and t4,s5,t3
1a8: 01cbff33 and t5,s7,t3
000001ac <hd_cal_shift>:
1ac: 001a5293 srl t0,s4,0x1
1b0: 01fa9313 sll t1,s5,0x1f
1b4: 0062ea33 or s4,t0,t1
1b8: 001ada93 srl s5,s5,0x1
1bc: 001b5293 srl t0,s6,0x1
1c0: 01fb9313 sll t1,s7,0x1f
1c4: 0062eb33 or s6,t0,t1
1c8: 001bdb93 srl s7,s7,0x1
1cc: 01ee8463 beq t4,t5,1d4 <hd_check_loop>
1d0: 00198993 add s3,s3,1
000001d4 <hd_check_loop>:
1d4: fff90913 add s2,s2,-1
1d8: fa0918e3 bnez s2,188 <hd_cal_loop>
1dc: 00098513 mv a0,s3
1e0: f7dff06f j 15c <hd_func_end>
000001e4 <clz>:
1e4: ffc10113 add sp,sp,-4
1e8: 00112023 sw ra,0(sp)
1ec: 02058a63 beqz a1,220 <clz_lower_set_one>
000001f0 <clz_upper_set_one>:
1f0: 0015d313 srl t1,a1,0x1
1f4: 0065e5b3 or a1,a1,t1
1f8: 0025d313 srl t1,a1,0x2
1fc: 0065e5b3 or a1,a1,t1
200: 0045d313 srl t1,a1,0x4
204: 0065e5b3 or a1,a1,t1
208: 0085d313 srl t1,a1,0x8
20c: 0065e5b3 or a1,a1,t1
210: 0105d313 srl t1,a1,0x10
214: 0065e5b3 or a1,a1,t1
218: fff00513 li a0,-1
21c: 02c0006f j 248 <clz_count_ones>
00000220 <clz_lower_set_one>:
220: 00155293 srl t0,a0,0x1
224: 00556533 or a0,a0,t0
228: 00255293 srl t0,a0,0x2
22c: 00556533 or a0,a0,t0
230: 00455293 srl t0,a0,0x4
234: 00556533 or a0,a0,t0
238: 00855293 srl t0,a0,0x8
23c: 00556533 or a0,a0,t0
240: 01055293 srl t0,a0,0x10
244: 00556533 or a0,a0,t0
00000248 <clz_count_ones>:
248: 00155293 srl t0,a0,0x1
24c: 01f59313 sll t1,a1,0x1f
250: 0062e2b3 or t0,t0,t1
254: 0015d313 srl t1,a1,0x1
258: 555553b7 lui t2,0x55555
25c: 55538393 add t2,t2,1365 # 55555555 <msg2+0x555551c3>
260: 0072f2b3 and t0,t0,t2
264: 00737333 and t1,t1,t2
268: 00553e33 sltu t3,a0,t0
26c: 40550533 sub a0,a0,t0
270: 406585b3 sub a1,a1,t1
274: 41c585b3 sub a1,a1,t3
278: 00255293 srl t0,a0,0x2
27c: 01e59313 sll t1,a1,0x1e
280: 0062e2b3 or t0,t0,t1
284: 0025d313 srl t1,a1,0x2
288: 333333b7 lui t2,0x33333
28c: 33338393 add t2,t2,819 # 33333333 <msg2+0x33332fa1>
290: 0072f2b3 and t0,t0,t2
294: 00737333 and t1,t1,t2
298: 00757eb3 and t4,a0,t2
29c: 0075ff33 and t5,a1,t2
2a0: 01d28533 add a0,t0,t4
2a4: 00553e33 sltu t3,a0,t0
2a8: 01e305b3 add a1,t1,t5
2ac: 01c585b3 add a1,a1,t3
2b0: 00455293 srl t0,a0,0x4
2b4: 01c59313 sll t1,a1,0x1c
2b8: 0062e2b3 or t0,t0,t1
2bc: 0045d313 srl t1,a1,0x4
2c0: 00a282b3 add t0,t0,a0
2c4: 00a2be33 sltu t3,t0,a0
2c8: 00b30333 add t1,t1,a1
2cc: 01c30333 add t1,t1,t3
2d0: 0f0f13b7 lui t2,0xf0f1
2d4: f0f38393 add t2,t2,-241 # f0f0f0f <msg2+0xf0f0b7d>
2d8: 0072f533 and a0,t0,t2
2dc: 007375b3 and a1,t1,t2
2e0: 00855293 srl t0,a0,0x8
2e4: 01859313 sll t1,a1,0x18
2e8: 0062e2b3 or t0,t0,t1
2ec: 0085d313 srl t1,a1,0x8
2f0: 00550533 add a0,a0,t0
2f4: 00553e33 sltu t3,a0,t0
2f8: 006585b3 add a1,a1,t1
2fc: 01c585b3 add a1,a1,t3
300: 01055293 srl t0,a0,0x10
304: 01059313 sll t1,a1,0x10
308: 0062e2b3 or t0,t0,t1
30c: 0105d313 srl t1,a1,0x10
310: 00550533 add a0,a0,t0
314: 00553e33 sltu t3,a0,t0
318: 006585b3 add a1,a1,t1
31c: 01c585b3 add a1,a1,t3
320: 00058293 mv t0,a1
324: 00000313 li t1,0
328: 00550533 add a0,a0,t0
32c: 00553e33 sltu t3,a0,t0
330: 006585b3 add a1,a1,t1
334: 01c585b3 add a1,a1,t3
338: 07f57513 and a0,a0,127
33c: 04000293 li t0,64
340: 40a28533 sub a0,t0,a0
344: 00012083 lw ra,0(sp)
348: 00410113 add sp,sp,4
34c: 00008067 ret
```
:::
### Optimize C code / Handwritten Assembly Code
#### Handwritten Assembly Code version1
I delete the redundant code in `hd_cal_loop` of original code.
```diff=
# hamming distance calculation (result save in a0, a1)
hd_cal_loop:
- # when the current digit larger than 32
- addi t2, zero, 32
- bgt s2, t2, hd_getLSB_upper
# hd_getLSB_lower : and with 1
li t3, 0x00000001
and t4, s4, t3
and t5, s6, t3
j hd_cal_shift
-hd_getLSB_upper:
- # and with 1
- li t3, 0x00000001
- and t4, s5, t3
- and t5, s7, t3
hd_cal_shift:
# (s5 s4) = x >> 1
srli t0, s4, 1
slli t1, s5, 31
or s4, t0, t1 # s4 >> 1
srli s5, s5, 1 # s5 >> 1
# (s7 s6) = x >> 1
srli t0, s6, 1
slli t1, s7, 31
or s6, t0, t1 # s6 >> 1
srli s7, s7, 1 # s7 >> 1
beq t4, t5, hd_check_loop
addi s3, s3, 1
```
#### C code version 1
I made some slight modifications to the code to make it more concise and efficient.
```diff=
int HammingDistance(uint64_t x0, uint64_t x1){
int Hdist = 0;
int16_t max_digit = 64 - (int16_t)count_leading_zeros((x0 > x1)? x0 : x1);
+ uint64_t c1 = x0 ^ x1;
while(max_digit > 0){
- uint64_t c1 = x0 & 1;
- uint64_t c2 = x1 & 1;
- if(c1 != c2) Hdist += 1;
- x0 = x0 >> 1;
- x1 = x1 >> 1;
+ Hdist += c1 & 1;
+ c1 = c1 >> 1;
max_digit -= 1;
}
return Hdist;
}
```
#### Handwritten Assembly code version2
Based on the modified code, I made changes to my assembly code.
```diff=
# hamming distance calculation (result save in a0, a1)
+hd_cal_xor:
+ xor s4, s4, s6
+ xor s5, s5, s7 #c1=(s5 s4)
hd_cal_loop:
# hd_getLSB_lower : and with 1
li t3, 0x00000001
and t4, s4, t3
- and t5, s6, t3
j hd_cal_shift
hd_cal_shift:
# (s5 s4) = x >> 1
srli t0, s4, 1
slli t1, s5, 31
or s4, t0, t1 # s4 >> 1
srli s5, s5, 1 # s5 >> 1
- #(s7 s6) = x >> 1
- srli t0, s6, 1
- slli t1, s7, 31
- or s6, t0, t1 # s6 >> 1
- srli s7, s7, 1 # s7 >> 1
- beq t4, t5, hd_check_loop
+ bne t4, t3, hd_check_loop
addi s3, s3, 1
hd_check_loop:
addi s2, s2, -1
bne s2, zero, hd_cal_loop
mv a0, s3 # save return value to a0
j hd_func_end
```
| Cycles | Original C code | C code version1 |
| ------ | --------------- | --------------- |
| -O0 | 11190 | 8594 |
| -O1 | 7198 | 6614 |
| -O2 | 7435 | 6851 |
| -O3 | 7363 | 6793 |
| -Os | 7400 | 6732 |
| -Ofast | 7363 | 6793 |
| | Handwritten assembly code | Assembly code v1 | Assembly code v2 |
|:------:|:---------------------:|:----------------:|:----------------:|
| Cycles | 2729 | 2526 | 1942 |
## Reference
* [Options That Control Optimization](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html)
* [text, data, bss, and dec](https://mirzafahad.github.io/2021-05-08-text-data-bss/)
* [An Introduction to Makefiles](https://www.gnu.org/software/make/manual/html_node/Introduction.html)