# Assignment2: GNU Toolchain Contributed by [Yan-You Chen](https://github.com/y0y0alex/CA/tree/main/homework2) ## Install Ubuntu on VirtualBox Because my computer is Windows, so I use the VirtualBox and Ubuntu to execute the rv32emu. ### The issues encountered during the installation I first encounter a problem is the version problem. I install the version ==Ubuntu Linux 20.04-LTS== on VirtualBox by the [install suggestion](https://hackmd.io/@sysprog/SJAR5XMmi). But after I execute ==make== on rv32emu, I should execute the command ==make check==. But the result is fail. #### Failed message ``` ca@ca-VirtualBox:~rc32emu$ make check Segmentation fault (core dumped) Segmentation fault (core dumped) Segmentation fault (core dumped) Running hello.elf ... Failed. make: *** [Makefile: 153 : check] 錯誤 1 ``` After I googling and asking proffesor, I know that the Ubuntu version should update to ==Ubuntu Linux 22.04== or a later version. ## Basic instruction to using RISCV gcc The gcc we used in rv32emu is ==riscv-none-elf-gcc==. * Setting the environment var > cd $HOME > source riscv-none-elf-gcc/setenv * Compiler > riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -O3 -o test test.c * Display the assembler mnemonics for the machine instructions and store to .txt file > riscv-none-elf-objdump -d test > test.txt * Check elf header > riscv-none-elf-readelf -h ./test * Check elf size > riscv-none-elf-size ./test * Run code > build/rv32emu test ## You should know before modify code Because we need to implement the CSR, we need to implement the [perfcounter](https://github.com/sysprog21/rv32emu/tree/master/tests/perfcounter). So, we first need to know what perfcounter do. #### perfcounter/main.c ```c #include <stdint.h> #include <stdio.h> #include <string.h> //those are functions in the same dir in perfcounter used to take CSR extern uint64_t get_cycles(); extern uint64_t get_instret(); int main(void) { /* measure cycles */ uint64_t instret = get_instret(); uint64_t oldcount = get_cycles(); /* fill the C code you choose here, so you can get the CSR*/ uint64_t cyclecount = get_cycles() - oldcount; printf("cycle count: %u\n", (unsigned int) cyclecount); printf("instret: %x\n", (unsigned) (instret & 0xffffffff)); return 0; } ``` This code is the base templete I rewrite from perfcount to implement the CSR. And another important file you need to check is Makefile. #### Makefile ```c .PHONY: clean include ../../mk/toolchain.mk /*-Ofast , -O1 , -O2 ... you want need to chane in Makefile*/ CFLAGS = -march=rv32i_zicsr_zifencei -mabi=ilp32 -O1 -Wall OBJS = \ /*if you want add file, fill in*/ getcycles.o \ getinstret.o \ main.o BIN = perfcount.elf /*the final result you generate*/ %.o: %.S $(CROSS_COMPILE)gcc $(CFLAGS) -c -o $@ $< %.o: %.c $(CROSS_COMPILE)gcc $(CFLAGS) -c -o $@ $< all: $(BIN) $(BIN): $(OBJS) $(CROSS_COMPILE)gcc -o $@ $^ clean: $(RM) $(BIN) $(OBJS) ``` After you finish those steps, you can start working. ## Operate step example ### 1. cd to the dir you place the code <s> ![](https://hackmd.io/_uploads/S1zKPTjZ6.png) </s> ``` ca@ca-VirtualBox:~rv32emu/tests/hw2$ ls getcycle.S getinstret.S main.c Makefile ``` :::warning :warning: Don't put the screenshots which contain plain text only. Instead, utilize HackMD syntax to annotate the text. :notes: jserv ::: ### 2. use the "make" to compile you code ![](https://hackmd.io/_uploads/S13lO6obT.png) The Makefile will help you compile files. ### 3. execute the .elf file to check the result ![](https://hackmd.io/_uploads/SyFTd6iZp.png) ### 4. if you want to clean the code you generated, use "make clean" ![](https://hackmd.io/_uploads/SkULYpjZT.png) --- ## Choose a Question * Problem: I chose the **Implement log base power of 2 with CLZ** from [洪胤勛](https://hackmd.io/@KXkA4u0LQuyNTwOorDw2RA/HJSryZ4g6) * Motivation: A basic math can be operated by a shifted skill is interesting. ### The origin code ```c #include <stdint.h> #include <stdio.h> uint16_t count_leading_zeros(uint64_t x) { x |= (x >> 1); x |= (x >> 2); x |= (x >> 4); x |= (x >> 8); x |= (x >> 16); x |= (x >> 32); /* count ones (population count) */ x -= ((x >> 1) & 0x5555555555555555); x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333); x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f; x += (x >> 8); x += (x >> 16); x += (x >> 32); return (64 - (x & 0x7f)); } // log base power of 2 uint16_t logp2(int power, uint16_t clz) { uint16_t result = 0; int tmp = 64 - clz; while (1) { tmp -= power; if (tmp <= 0) break; result++; } return result; } int main(int argc, char* argv[]) { uint64_t a = 64; uint16_t clz = count_leading_zeros(a); printf("%d\n", logp2(2, clz)); return 0; } ``` ### Modified code to CSR ```c #include <stdint.h> #include <stdio.h> extern uint64_t get_cycles(); extern uint64_t get_instret(); uint16_t count_leading_zeros(uint64_t x) { x |= (x >> 1); x |= (x >> 2); x |= (x >> 4); x |= (x >> 8); x |= (x >> 16); x |= (x >> 32); /* count ones (population count) */ x -= ((x >> 1) & 0x5555555555555555); x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333); x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f; x += (x >> 8); x += (x >> 16); x += (x >> 32); return (64 - (x & 0x7f)); } // log base power of 2 uint16_t logp2(int power, uint16_t clz) { uint16_t result = 0; int tmp = 64 - clz; while (1) { tmp -= power; if (tmp <= 0) break; result++; } return result; } int main(void) { /* measure cycles */ uint64_t instret = get_instret(); uint64_t oldcount = get_cycles(); uint64_t a = 64; uint16_t clz = count_leading_zeros(a); uint16_t ans = logp2(1, clz); uint64_t cyclecount = get_cycles() - oldcount; printf("cycle count: %u\n", (unsigned int) cyclecount); printf("instret: %x\n", (unsigned) (instret & 0xffffffff)); printf("Input data is : %lld\n", a); printf("The log based 2 is : %d\n", ans); return 0; } ``` ## **Compare Assembly Code** ``` The test data are all 64 and use log2 base. ``` ## -O1 Optimized Assembly Code ### Assembly code ```c 0001016c <count_leading_zeros>: 1016c: 01f59713 sll a4,a1,0x1f 10170: 00155793 srl a5,a0,0x1 10174: 00f767b3 or a5,a4,a5 10178: 0015d713 srl a4,a1,0x1 1017c: 00a7e533 or a0,a5,a0 10180: 00b765b3 or a1,a4,a1 10184: 01e59713 sll a4,a1,0x1e 10188: 00255793 srl a5,a0,0x2 1018c: 00f767b3 or a5,a4,a5 10190: 0025d613 srl a2,a1,0x2 10194: 00a7e533 or a0,a5,a0 10198: 00b66633 or a2,a2,a1 1019c: 01c61713 sll a4,a2,0x1c 101a0: 00455793 srl a5,a0,0x4 101a4: 00f767b3 or a5,a4,a5 101a8: 00465693 srl a3,a2,0x4 101ac: 00a7e733 or a4,a5,a0 101b0: 00c6e6b3 or a3,a3,a2 101b4: 01869613 sll a2,a3,0x18 101b8: 00875793 srl a5,a4,0x8 101bc: 00f667b3 or a5,a2,a5 101c0: 0086d613 srl a2,a3,0x8 101c4: 00e7e7b3 or a5,a5,a4 101c8: 00d66633 or a2,a2,a3 101cc: 01061713 sll a4,a2,0x10 101d0: 0107d693 srl a3,a5,0x10 101d4: 00d766b3 or a3,a4,a3 101d8: 01065713 srl a4,a2,0x10 101dc: 00f6e6b3 or a3,a3,a5 101e0: 00c76733 or a4,a4,a2 101e4: 00d766b3 or a3,a4,a3 101e8: 01f71613 sll a2,a4,0x1f 101ec: 0016d793 srl a5,a3,0x1 101f0: 00f667b3 or a5,a2,a5 101f4: 00175593 srl a1,a4,0x1 101f8: 55555637 lui a2,0x55555 101fc: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x55537805> 10200: 00c7f7b3 and a5,a5,a2 10204: 00c5f633 and a2,a1,a2 10208: 40f687b3 sub a5,a3,a5 1020c: 00f6b6b3 sltu a3,a3,a5 10210: 40c70733 sub a4,a4,a2 10214: 40d70733 sub a4,a4,a3 10218: 01e71613 sll a2,a4,0x1e 1021c: 0027d693 srl a3,a5,0x2 10220: 00d666b3 or a3,a2,a3 10224: 00275593 srl a1,a4,0x2 10228: 33333637 lui a2,0x33333 1022c: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155e3> 10230: 00c6f6b3 and a3,a3,a2 10234: 00c5f5b3 and a1,a1,a2 10238: 00c7f7b3 and a5,a5,a2 1023c: 00c77733 and a4,a4,a2 10240: 00f687b3 add a5,a3,a5 10244: 00d7b6b3 sltu a3,a5,a3 10248: 00e58733 add a4,a1,a4 1024c: 00e686b3 add a3,a3,a4 10250: 01c69613 sll a2,a3,0x1c 10254: 0047d713 srl a4,a5,0x4 10258: 00e66733 or a4,a2,a4 1025c: 0046d613 srl a2,a3,0x4 10260: 00f707b3 add a5,a4,a5 10264: 00e7b733 sltu a4,a5,a4 10268: 00d606b3 add a3,a2,a3 1026c: 00d70733 add a4,a4,a3 10270: 0f0f16b7 lui a3,0xf0f1 10274: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d31bf> 10278: 00d7f7b3 and a5,a5,a3 1027c: 00d77733 and a4,a4,a3 10280: 01871613 sll a2,a4,0x18 10284: 0087d693 srl a3,a5,0x8 10288: 00d666b3 or a3,a2,a3 1028c: 00875613 srl a2,a4,0x8 10290: 00f687b3 add a5,a3,a5 10294: 00d7b6b3 sltu a3,a5,a3 10298: 00e60733 add a4,a2,a4 1029c: 00e686b3 add a3,a3,a4 102a0: 01069613 sll a2,a3,0x10 102a4: 0107d713 srl a4,a5,0x10 102a8: 00e66733 or a4,a2,a4 102ac: 0106d613 srl a2,a3,0x10 102b0: 00f707b3 add a5,a4,a5 102b4: 00e7b733 sltu a4,a5,a4 102b8: 00d606b3 add a3,a2,a3 102bc: 00d70733 add a4,a4,a3 102c0: 00f70733 add a4,a4,a5 102c4: 07f77713 and a4,a4,127 102c8: 04000513 li a0,64 102cc: 40e50533 sub a0,a0,a4 102d0: 01051513 sll a0,a0,0x10 102d4: 01055513 srl a0,a0,0x10 102d8: 00008067 ret 000102dc <logp2>: 102dc: 00050713 mv a4,a0 102e0: 04000793 li a5,64 102e4: 40b785b3 sub a1,a5,a1 102e8: 40a585b3 sub a1,a1,a0 102ec: 02b05063 blez a1,1030c <logp2+0x30> 102f0: 00000513 li a0,0 102f4: 00150513 add a0,a0,1 102f8: 01051513 sll a0,a0,0x10 102fc: 01055513 srl a0,a0,0x10 10300: 40e585b3 sub a1,a1,a4 10304: feb048e3 bgtz a1,102f4 <logp2+0x18> 10308: 00008067 ret 1030c: 00000513 li a0,0 10310: 00008067 ret 00010314 <main>: 10314: ff010113 add sp,sp,-16 10318: 00112623 sw ra,12(sp) 1031c: 00812423 sw s0,8(sp) 10320: 00912223 sw s1,4(sp) 10324: 01212023 sw s2,0(sp) 10328: e31ff0ef jal 10158 <get_instret> 1032c: 00050493 mv s1,a0 10330: e15ff0ef jal 10144 <get_cycles> 10334: 00050913 mv s2,a0 10338: 03900593 li a1,57 1033c: 00100513 li a0,1 10340: f9dff0ef jal 102dc <logp2> 10344: 00050413 mv s0,a0 10348: dfdff0ef jal 10144 <get_cycles> 1034c: 412505b3 sub a1,a0,s2 10350: 0001c537 lui a0,0x1c 10354: c3850513 add a0,a0,-968 # 1bc38 <__clzsi2+0x72> 10358: 430000ef jal 10788 <printf> 1035c: 00048593 mv a1,s1 10360: 0001c537 lui a0,0x1c 10364: c4c50513 add a0,a0,-948 # 1bc4c <__clzsi2+0x86> 10368: 420000ef jal 10788 <printf> 1036c: 04000613 li a2,64 10370: 00000693 li a3,0 10374: 0001c537 lui a0,0x1c 10378: c5c50513 add a0,a0,-932 # 1bc5c <__clzsi2+0x96> 1037c: 40c000ef jal 10788 <printf> 10380: 00040593 mv a1,s0 10384: 0001c537 lui a0,0x1c 10388: c7450513 add a0,a0,-908 # 1bc74 <__clzsi2+0xae> 1038c: 3fc000ef jal 10788 <printf> 10390: 00000513 li a0,0 10394: 00c12083 lw ra,12(sp) 10398: 00812403 lw s0,8(sp) 1039c: 00412483 lw s1,4(sp) 103a0: 00012903 lw s2,0(sp) 103a4: 01010113 add sp,sp,16 103a8: 00008067 ret ``` ### elf size ![](https://hackmd.io/_uploads/H1KgPJ3b6.png) ### execute ![](https://hackmd.io/_uploads/SyEYPyh-a.png) * Observation: * Line of code : `147` * Allocate `16` bytes on stack * Registers used : `$ra`, `$sp` , `$s0~$s2`, `$a0~$a5` * Number of `lw` and `sw` : `4` and `4` * execute output: * cycle count : 48 * instret : 2c6 ## -O2 Optimized Assembly Code ### Assembly code ```c 000101ec <count_leading_zeros>: 101ec: 01f59713 sll a4,a1,0x1f 101f0: 00155793 srl a5,a0,0x1 101f4: 00f767b3 or a5,a4,a5 101f8: 0015d713 srl a4,a1,0x1 101fc: 00b765b3 or a1,a4,a1 10200: 00a7e533 or a0,a5,a0 10204: 01e59713 sll a4,a1,0x1e 10208: 00255793 srl a5,a0,0x2 1020c: 00f767b3 or a5,a4,a5 10210: 0025d713 srl a4,a1,0x2 10214: 00b76733 or a4,a4,a1 10218: 00a7e533 or a0,a5,a0 1021c: 01c71693 sll a3,a4,0x1c 10220: 00455793 srl a5,a0,0x4 10224: 00f6e7b3 or a5,a3,a5 10228: 00475693 srl a3,a4,0x4 1022c: 00e6e6b3 or a3,a3,a4 10230: 00a7e733 or a4,a5,a0 10234: 01869613 sll a2,a3,0x18 10238: 00875793 srl a5,a4,0x8 1023c: 00f667b3 or a5,a2,a5 10240: 0086d613 srl a2,a3,0x8 10244: 00d66633 or a2,a2,a3 10248: 00e7e7b3 or a5,a5,a4 1024c: 0107d693 srl a3,a5,0x10 10250: 01061713 sll a4,a2,0x10 10254: 00d766b3 or a3,a4,a3 10258: 01065713 srl a4,a2,0x10 1025c: 00c76733 or a4,a4,a2 10260: 00f6e6b3 or a3,a3,a5 10264: 00d766b3 or a3,a4,a3 10268: 01f71593 sll a1,a4,0x1f 1026c: 0016d793 srl a5,a3,0x1 10270: 55555637 lui a2,0x55555 10274: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x55537805> 10278: 00f5e7b3 or a5,a1,a5 1027c: 00c7f7b3 and a5,a5,a2 10280: 00175593 srl a1,a4,0x1 10284: 40f687b3 sub a5,a3,a5 10288: 00c5f633 and a2,a1,a2 1028c: 00f6b6b3 sltu a3,a3,a5 10290: 40c70733 sub a4,a4,a2 10294: 40d70733 sub a4,a4,a3 10298: 01e71593 sll a1,a4,0x1e 1029c: 0027d693 srl a3,a5,0x2 102a0: 33333637 lui a2,0x33333 102a4: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155e3> 102a8: 00d5e6b3 or a3,a1,a3 102ac: 00c6f6b3 and a3,a3,a2 102b0: 00275593 srl a1,a4,0x2 102b4: 00c7f7b3 and a5,a5,a2 102b8: 00f687b3 add a5,a3,a5 102bc: 00c5f5b3 and a1,a1,a2 102c0: 00c77733 and a4,a4,a2 102c4: 00e58733 add a4,a1,a4 102c8: 00d7b6b3 sltu a3,a5,a3 102cc: 00e686b3 add a3,a3,a4 102d0: 01c69613 sll a2,a3,0x1c 102d4: 0047d713 srl a4,a5,0x4 102d8: 00e66733 or a4,a2,a4 102dc: 00f707b3 add a5,a4,a5 102e0: 0046d613 srl a2,a3,0x4 102e4: 00d60633 add a2,a2,a3 102e8: 00e7b733 sltu a4,a5,a4 102ec: 0f0f16b7 lui a3,0xf0f1 102f0: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d31bf> 102f4: 00c70733 add a4,a4,a2 102f8: 00d77733 and a4,a4,a3 102fc: 00d7f7b3 and a5,a5,a3 10300: 01871613 sll a2,a4,0x18 10304: 0087d693 srl a3,a5,0x8 10308: 00d666b3 or a3,a2,a3 1030c: 00f687b3 add a5,a3,a5 10310: 00875613 srl a2,a4,0x8 10314: 00e60733 add a4,a2,a4 10318: 00d7b6b3 sltu a3,a5,a3 1031c: 00e686b3 add a3,a3,a4 10320: 01069613 sll a2,a3,0x10 10324: 0107d713 srl a4,a5,0x10 10328: 00e66733 or a4,a2,a4 1032c: 00f707b3 add a5,a4,a5 10330: 0106d613 srl a2,a3,0x10 10334: 00e7b733 sltu a4,a5,a4 10338: 00d606b3 add a3,a2,a3 1033c: 00d70733 add a4,a4,a3 10340: 00f70733 add a4,a4,a5 10344: 07f77713 and a4,a4,127 10348: 04000513 li a0,64 1034c: 40e50533 sub a0,a0,a4 10350: 01051513 sll a0,a0,0x10 10354: 01055513 srl a0,a0,0x10 10358: 00008067 ret 0001035c <logp2>: 1035c: 04000793 li a5,64 10360: 40b785b3 sub a1,a5,a1 10364: 40a585b3 sub a1,a1,a0 10368: 00050713 mv a4,a0 1036c: 00000513 li a0,0 10370: 00b05e63 blez a1,1038c <logp2+0x30> 10374: 00150513 add a0,a0,1 10378: 01051513 sll a0,a0,0x10 1037c: 40e585b3 sub a1,a1,a4 10380: 01055513 srl a0,a0,0x10 10384: feb048e3 bgtz a1,10374 <logp2+0x18> 10388: 00008067 ret 1038c: 00008067 ret 000100b0 <main>: 100b0: ff010113 add sp,sp,-16 100b4: 00112623 sw ra,12(sp) 100b8: 00812423 sw s0,8(sp) 100bc: 00912223 sw s1,4(sp) 100c0: 118000ef jal 101d8 <get_instret> 100c4: 00050413 mv s0,a0 100c8: 0fc000ef jal 101c4 <get_cycles> 100cc: 00050493 mv s1,a0 100d0: 0f4000ef jal 101c4 <get_cycles> 100d4: 409505b3 sub a1,a0,s1 100d8: 0001c537 lui a0,0x1c 100dc: c1850513 add a0,a0,-1000 # 1bc18 <__clzsi2+0x6e> 100e0: 68c000ef jal 1076c <printf> 100e4: 0001c537 lui a0,0x1c 100e8: 00040593 mv a1,s0 100ec: c2c50513 add a0,a0,-980 # 1bc2c <__clzsi2+0x82> 100f0: 67c000ef jal 1076c <printf> 100f4: 0001c537 lui a0,0x1c 100f8: 04000613 li a2,64 100fc: 00000693 li a3,0 10100: c3c50513 add a0,a0,-964 # 1bc3c <__clzsi2+0x92> 10104: 668000ef jal 1076c <printf> 10108: 0001c537 lui a0,0x1c 1010c: 00600593 li a1,6 10110: c5450513 add a0,a0,-940 # 1bc54 <__clzsi2+0xaa> 10114: 658000ef jal 1076c <printf> 10118: 00c12083 lw ra,12(sp) 1011c: 00812403 lw s0,8(sp) 10120: 00412483 lw s1,4(sp) 10124: 00000513 li a0,0 10128: 01010113 add sp,sp,16 1012c: 00008067 ret ``` ### elf size ![](https://hackmd.io/_uploads/Hy0Ktkn-T.png) ### execute ![](https://hackmd.io/_uploads/BJsit13-6.png) * Observation: * Line of code : `140` * Allocate `16` bytes on stack * Registers used : `$ra`, `$sp` , `$s0~$s1`, `$a0~$a5` * Number of `lw` and `sw` : `3` and `3` * execute output: * cycle count : 7 * instret : 2c5 ## -O3 Optimized Assembly Code ### Assembly code ```c 000101ec <count_leading_zeros>: 101ec: 01f59713 sll a4,a1,0x1f 101f0: 00155793 srl a5,a0,0x1 101f4: 00f767b3 or a5,a4,a5 101f8: 0015d713 srl a4,a1,0x1 101fc: 00b765b3 or a1,a4,a1 10200: 00a7e533 or a0,a5,a0 10204: 01e59713 sll a4,a1,0x1e 10208: 00255793 srl a5,a0,0x2 1020c: 00f767b3 or a5,a4,a5 10210: 0025d713 srl a4,a1,0x2 10214: 00b76733 or a4,a4,a1 10218: 00a7e533 or a0,a5,a0 1021c: 01c71693 sll a3,a4,0x1c 10220: 00455793 srl a5,a0,0x4 10224: 00f6e7b3 or a5,a3,a5 10228: 00475693 srl a3,a4,0x4 1022c: 00e6e6b3 or a3,a3,a4 10230: 00a7e733 or a4,a5,a0 10234: 01869613 sll a2,a3,0x18 10238: 00875793 srl a5,a4,0x8 1023c: 00f667b3 or a5,a2,a5 10240: 0086d613 srl a2,a3,0x8 10244: 00d66633 or a2,a2,a3 10248: 00e7e7b3 or a5,a5,a4 1024c: 0107d693 srl a3,a5,0x10 10250: 01061713 sll a4,a2,0x10 10254: 00d766b3 or a3,a4,a3 10258: 01065713 srl a4,a2,0x10 1025c: 00c76733 or a4,a4,a2 10260: 00f6e6b3 or a3,a3,a5 10264: 00d766b3 or a3,a4,a3 10268: 01f71593 sll a1,a4,0x1f 1026c: 0016d793 srl a5,a3,0x1 10270: 55555637 lui a2,0x55555 10274: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x55537805> 10278: 00f5e7b3 or a5,a1,a5 1027c: 00c7f7b3 and a5,a5,a2 10280: 00175593 srl a1,a4,0x1 10284: 40f687b3 sub a5,a3,a5 10288: 00c5f633 and a2,a1,a2 1028c: 00f6b6b3 sltu a3,a3,a5 10290: 40c70733 sub a4,a4,a2 10294: 40d70733 sub a4,a4,a3 10298: 01e71593 sll a1,a4,0x1e 1029c: 0027d693 srl a3,a5,0x2 102a0: 33333637 lui a2,0x33333 102a4: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155e3> 102a8: 00d5e6b3 or a3,a1,a3 102ac: 00c6f6b3 and a3,a3,a2 102b0: 00275593 srl a1,a4,0x2 102b4: 00c7f7b3 and a5,a5,a2 102b8: 00f687b3 add a5,a3,a5 102bc: 00c5f5b3 and a1,a1,a2 102c0: 00c77733 and a4,a4,a2 102c4: 00e58733 add a4,a1,a4 102c8: 00d7b6b3 sltu a3,a5,a3 102cc: 00e686b3 add a3,a3,a4 102d0: 01c69613 sll a2,a3,0x1c 102d4: 0047d713 srl a4,a5,0x4 102d8: 00e66733 or a4,a2,a4 102dc: 00f707b3 add a5,a4,a5 102e0: 0046d613 srl a2,a3,0x4 102e4: 00d60633 add a2,a2,a3 102e8: 00e7b733 sltu a4,a5,a4 102ec: 0f0f16b7 lui a3,0xf0f1 102f0: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d31bf> 102f4: 00c70733 add a4,a4,a2 102f8: 00d77733 and a4,a4,a3 102fc: 00d7f7b3 and a5,a5,a3 10300: 01871613 sll a2,a4,0x18 10304: 0087d693 srl a3,a5,0x8 10308: 00d666b3 or a3,a2,a3 1030c: 00f687b3 add a5,a3,a5 10310: 00875613 srl a2,a4,0x8 10314: 00e60733 add a4,a2,a4 10318: 00d7b6b3 sltu a3,a5,a3 1031c: 00e686b3 add a3,a3,a4 10320: 01069613 sll a2,a3,0x10 10324: 0107d713 srl a4,a5,0x10 10328: 00e66733 or a4,a2,a4 1032c: 00f707b3 add a5,a4,a5 10330: 0106d613 srl a2,a3,0x10 10334: 00e7b733 sltu a4,a5,a4 10338: 00d606b3 add a3,a2,a3 1033c: 00d70733 add a4,a4,a3 10340: 00f70733 add a4,a4,a5 10344: 07f77713 and a4,a4,127 10348: 04000513 li a0,64 1034c: 40e50533 sub a0,a0,a4 10350: 01051513 sll a0,a0,0x10 10354: 01055513 srl a0,a0,0x10 10358: 00008067 ret 0001035c <logp2>: 1035c: 04000793 li a5,64 10360: 40b785b3 sub a1,a5,a1 10364: 40a585b3 sub a1,a1,a0 10368: 00050713 mv a4,a0 1036c: 00000513 li a0,0 10370: 00b05e63 blez a1,1038c <logp2+0x30> 10374: 00150513 add a0,a0,1 10378: 01051513 sll a0,a0,0x10 1037c: 40e585b3 sub a1,a1,a4 10380: 01055513 srl a0,a0,0x10 10384: feb048e3 bgtz a1,10374 <logp2+0x18> 10388: 00008067 ret 1038c: 00008067 ret 000100b0 <main>: 100b0: ff010113 add sp,sp,-16 100b4: 00112623 sw ra,12(sp) 100b8: 00812423 sw s0,8(sp) 100bc: 00912223 sw s1,4(sp) 100c0: 118000ef jal 101d8 <get_instret> 100c4: 00050413 mv s0,a0 100c8: 0fc000ef jal 101c4 <get_cycles> 100cc: 00050493 mv s1,a0 100d0: 0f4000ef jal 101c4 <get_cycles> 100d4: 409505b3 sub a1,a0,s1 100d8: 0001c537 lui a0,0x1c 100dc: c1850513 add a0,a0,-1000 # 1bc18 <__clzsi2+0x6e> 100e0: 68c000ef jal 1076c <printf> 100e4: 0001c537 lui a0,0x1c 100e8: 00040593 mv a1,s0 100ec: c2c50513 add a0,a0,-980 # 1bc2c <__clzsi2+0x82> 100f0: 67c000ef jal 1076c <printf> 100f4: 0001c537 lui a0,0x1c 100f8: 04000613 li a2,64 100fc: 00000693 li a3,0 10100: c3c50513 add a0,a0,-964 # 1bc3c <__clzsi2+0x92> 10104: 668000ef jal 1076c <printf> 10108: 0001c537 lui a0,0x1c 1010c: 00600593 li a1,6 10110: c5450513 add a0,a0,-940 # 1bc54 <__clzsi2+0xaa> 10114: 658000ef jal 1076c <printf> 10118: 00c12083 lw ra,12(sp) 1011c: 00812403 lw s0,8(sp) 10120: 00412483 lw s1,4(sp) 10124: 00000513 li a0,0 10128: 01010113 add sp,sp,16 1012c: 00008067 ret ``` ### elf size <s>![](https://hackmd.io/_uploads/BJVnoyn-a.png)</s> ``` ca@ca-VirtualBox:~rv32emu/tests/hw2$ riscv-none-elf-size ./perfcount.elf text data bss dec hex filename 51608 1876 1528 55012 d6e4 ./perfcount.elf ``` ### execute ![](https://hackmd.io/_uploads/S1ipjk2W6.png) :::warning :warning: Don't put the screenshots which contain plain text only. Instead, utilize HackMD syntax to annotate the text. :notes: jserv ::: * Observation: * Line of code : `140` * Allocate `16` bytes on stack * Registers used : `$ra`, `$sp` , `$s0~$s1`, `$a0~$a5` * Number of `lw` and `sw` : `3` and `3` * execute output: * cycle count : 7 * instret : 2c5 ## -Os Optimized Assembly Code ### Assembly code ```c 00010204 <count_leading_zeros>: 10204: 01f59713 sll a4,a1,0x1f 10208: 00155793 srl a5,a0,0x1 1020c: 00f767b3 or a5,a4,a5 10210: 0015d713 srl a4,a1,0x1 10214: 00b765b3 or a1,a4,a1 10218: 00a7e533 or a0,a5,a0 1021c: 01e59713 sll a4,a1,0x1e 10220: 00255793 srl a5,a0,0x2 10224: 00f767b3 or a5,a4,a5 10228: 0025d613 srl a2,a1,0x2 1022c: 00b66633 or a2,a2,a1 10230: 00a7e533 or a0,a5,a0 10234: 01c61713 sll a4,a2,0x1c 10238: 00455793 srl a5,a0,0x4 1023c: 00f767b3 or a5,a4,a5 10240: 00465693 srl a3,a2,0x4 10244: 00a7e733 or a4,a5,a0 10248: 00c6e6b3 or a3,a3,a2 1024c: 01869613 sll a2,a3,0x18 10250: 00875793 srl a5,a4,0x8 10254: 00f667b3 or a5,a2,a5 10258: 0086d613 srl a2,a3,0x8 1025c: 00d66633 or a2,a2,a3 10260: 00e7e7b3 or a5,a5,a4 10264: 0107d693 srl a3,a5,0x10 10268: 01061713 sll a4,a2,0x10 1026c: 00d766b3 or a3,a4,a3 10270: 01065713 srl a4,a2,0x10 10274: 00c76733 or a4,a4,a2 10278: 00f6e6b3 or a3,a3,a5 1027c: 00d766b3 or a3,a4,a3 10280: 01f71613 sll a2,a4,0x1f 10284: 0016d793 srl a5,a3,0x1 10288: 00f667b3 or a5,a2,a5 1028c: 55555637 lui a2,0x55555 10290: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x55537805> 10294: 00175593 srl a1,a4,0x1 10298: 00c7f7b3 and a5,a5,a2 1029c: 40f687b3 sub a5,a3,a5 102a0: 00c5f633 and a2,a1,a2 102a4: 00f6b6b3 sltu a3,a3,a5 102a8: 40c70733 sub a4,a4,a2 102ac: 40d70733 sub a4,a4,a3 102b0: 01e71613 sll a2,a4,0x1e 102b4: 0027d693 srl a3,a5,0x2 102b8: 00d666b3 or a3,a2,a3 102bc: 33333637 lui a2,0x33333 102c0: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155e3> 102c4: 00c6f6b3 and a3,a3,a2 102c8: 00275593 srl a1,a4,0x2 102cc: 00c7f7b3 and a5,a5,a2 102d0: 00c5f5b3 and a1,a1,a2 102d4: 00f687b3 add a5,a3,a5 102d8: 00c77733 and a4,a4,a2 102dc: 00e58733 add a4,a1,a4 102e0: 00d7b6b3 sltu a3,a5,a3 102e4: 00e686b3 add a3,a3,a4 102e8: 01c69613 sll a2,a3,0x1c 102ec: 0047d713 srl a4,a5,0x4 102f0: 00e66733 or a4,a2,a4 102f4: 00f707b3 add a5,a4,a5 102f8: 0046d613 srl a2,a3,0x4 102fc: 00d606b3 add a3,a2,a3 10300: 00e7b733 sltu a4,a5,a4 10304: 00d70733 add a4,a4,a3 10308: 0f0f16b7 lui a3,0xf0f1 1030c: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d31bf> 10310: 00d77733 and a4,a4,a3 10314: 00d7f7b3 and a5,a5,a3 10318: 01871613 sll a2,a4,0x18 1031c: 0087d693 srl a3,a5,0x8 10320: 00d666b3 or a3,a2,a3 10324: 00f687b3 add a5,a3,a5 10328: 00875613 srl a2,a4,0x8 1032c: 00e60733 add a4,a2,a4 10330: 00d7b6b3 sltu a3,a5,a3 10334: 00e686b3 add a3,a3,a4 10338: 01069613 sll a2,a3,0x10 1033c: 0107d713 srl a4,a5,0x10 10340: 00e66733 or a4,a2,a4 10344: 00f707b3 add a5,a4,a5 10348: 0106d613 srl a2,a3,0x10 1034c: 00e7b733 sltu a4,a5,a4 10350: 00d606b3 add a3,a2,a3 10354: 00d70733 add a4,a4,a3 10358: 00f70733 add a4,a4,a5 1035c: 07f77713 and a4,a4,127 10360: 04000513 li a0,64 10364: 40e50533 sub a0,a0,a4 10368: 01051513 sll a0,a0,0x10 1036c: 01055513 srl a0,a0,0x10 10370: 00008067 ret 00010374 <logp2>: 10374: 04000793 li a5,64 10378: 00050713 mv a4,a0 1037c: 40b785b3 sub a1,a5,a1 10380: 00000513 li a0,0 10384: 40e585b3 sub a1,a1,a4 10388: 00b05a63 blez a1,1039c <logp2+0x28> 1038c: 00150793 add a5,a0,1 10390: 01079513 sll a0,a5,0x10 10394: 01055513 srl a0,a0,0x10 10398: fedff06f j 10384 <logp2+0x10> 1039c: 00008067 ret 000100b0 <main>: 100b0: ff010113 add sp,sp,-16 100b4: 00112623 sw ra,12(sp) 100b8: 00812423 sw s0,8(sp) 100bc: 00912223 sw s1,4(sp) 100c0: 01212023 sw s2,0(sp) 100c4: 12c000ef jal 101f0 <get_instret> 100c8: 00050493 mv s1,a0 100cc: 110000ef jal 101dc <get_cycles> 100d0: 00050913 mv s2,a0 100d4: 03900593 li a1,57 100d8: 00100513 li a0,1 100dc: 298000ef jal 10374 <logp2> 100e0: 00050413 mv s0,a0 100e4: 0f8000ef jal 101dc <get_cycles> 100e8: 412505b3 sub a1,a0,s2 100ec: 0001c537 lui a0,0x1c 100f0: c2850513 add a0,a0,-984 # 1bc28 <__clzsi2+0x6e> 100f4: 688000ef jal 1077c <printf> 100f8: 0001c537 lui a0,0x1c 100fc: 00048593 mv a1,s1 10100: c3c50513 add a0,a0,-964 # 1bc3c <__clzsi2+0x82> 10104: 678000ef jal 1077c <printf> 10108: 0001c537 lui a0,0x1c 1010c: 04000613 li a2,64 10110: 00000693 li a3,0 10114: c4c50513 add a0,a0,-948 # 1bc4c <__clzsi2+0x92> 10118: 664000ef jal 1077c <printf> 1011c: 0001c537 lui a0,0x1c 10120: 00040593 mv a1,s0 10124: c6450513 add a0,a0,-924 # 1bc64 <__clzsi2+0xaa> 10128: 654000ef jal 1077c <printf> 1012c: 00c12083 lw ra,12(sp) 10130: 00812403 lw s0,8(sp) 10134: 00412483 lw s1,4(sp) 10138: 00012903 lw s2,0(sp) 1013c: 00000513 li a0,0 10140: 01010113 add sp,sp,16 10144: 00008067 ret ``` ### elf size ![](https://hackmd.io/_uploads/B1mLp12bp.png) ### execute ![](https://hackmd.io/_uploads/SJ8D61n-p.png) * Observation: * Line of code : `144` * Allocate `16` bytes on stack * Registers used : `$ra`, `$sp` , `$s0~$s2`, `$a0~$a5` * Number of `lw` and `sw` : `4` and `4` * execute output: * cycle count : 54 * instret : 2c6 ## -Ofast Optimized Assembly Code ### Assembly code ```c 000101ec <count_leading_zeros>: 101ec: 01f59713 sll a4,a1,0x1f 101f0: 00155793 srl a5,a0,0x1 101f4: 00f767b3 or a5,a4,a5 101f8: 0015d713 srl a4,a1,0x1 101fc: 00b765b3 or a1,a4,a1 10200: 00a7e533 or a0,a5,a0 10204: 01e59713 sll a4,a1,0x1e 10208: 00255793 srl a5,a0,0x2 1020c: 00f767b3 or a5,a4,a5 10210: 0025d713 srl a4,a1,0x2 10214: 00b76733 or a4,a4,a1 10218: 00a7e533 or a0,a5,a0 1021c: 01c71693 sll a3,a4,0x1c 10220: 00455793 srl a5,a0,0x4 10224: 00f6e7b3 or a5,a3,a5 10228: 00475693 srl a3,a4,0x4 1022c: 00e6e6b3 or a3,a3,a4 10230: 00a7e733 or a4,a5,a0 10234: 01869613 sll a2,a3,0x18 10238: 00875793 srl a5,a4,0x8 1023c: 00f667b3 or a5,a2,a5 10240: 0086d613 srl a2,a3,0x8 10244: 00d66633 or a2,a2,a3 10248: 00e7e7b3 or a5,a5,a4 1024c: 0107d693 srl a3,a5,0x10 10250: 01061713 sll a4,a2,0x10 10254: 00d766b3 or a3,a4,a3 10258: 01065713 srl a4,a2,0x10 1025c: 00c76733 or a4,a4,a2 10260: 00f6e6b3 or a3,a3,a5 10264: 00d766b3 or a3,a4,a3 10268: 01f71593 sll a1,a4,0x1f 1026c: 0016d793 srl a5,a3,0x1 10270: 55555637 lui a2,0x55555 10274: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x55537805> 10278: 00f5e7b3 or a5,a1,a5 1027c: 00c7f7b3 and a5,a5,a2 10280: 00175593 srl a1,a4,0x1 10284: 40f687b3 sub a5,a3,a5 10288: 00c5f633 and a2,a1,a2 1028c: 00f6b6b3 sltu a3,a3,a5 10290: 40c70733 sub a4,a4,a2 10294: 40d70733 sub a4,a4,a3 10298: 01e71593 sll a1,a4,0x1e 1029c: 0027d693 srl a3,a5,0x2 102a0: 33333637 lui a2,0x33333 102a4: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155e3> 102a8: 00d5e6b3 or a3,a1,a3 102ac: 00c6f6b3 and a3,a3,a2 102b0: 00275593 srl a1,a4,0x2 102b4: 00c7f7b3 and a5,a5,a2 102b8: 00f687b3 add a5,a3,a5 102bc: 00c5f5b3 and a1,a1,a2 102c0: 00c77733 and a4,a4,a2 102c4: 00e58733 add a4,a1,a4 102c8: 00d7b6b3 sltu a3,a5,a3 102cc: 00e686b3 add a3,a3,a4 102d0: 01c69613 sll a2,a3,0x1c 102d4: 0047d713 srl a4,a5,0x4 102d8: 00e66733 or a4,a2,a4 102dc: 00f707b3 add a5,a4,a5 102e0: 0046d613 srl a2,a3,0x4 102e4: 00d60633 add a2,a2,a3 102e8: 00e7b733 sltu a4,a5,a4 102ec: 0f0f16b7 lui a3,0xf0f1 102f0: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d31bf> 102f4: 00c70733 add a4,a4,a2 102f8: 00d77733 and a4,a4,a3 102fc: 00d7f7b3 and a5,a5,a3 10300: 01871613 sll a2,a4,0x18 10304: 0087d693 srl a3,a5,0x8 10308: 00d666b3 or a3,a2,a3 1030c: 00f687b3 add a5,a3,a5 10310: 00875613 srl a2,a4,0x8 10314: 00e60733 add a4,a2,a4 10318: 00d7b6b3 sltu a3,a5,a3 1031c: 00e686b3 add a3,a3,a4 10320: 01069613 sll a2,a3,0x10 10324: 0107d713 srl a4,a5,0x10 10328: 00e66733 or a4,a2,a4 1032c: 00f707b3 add a5,a4,a5 10330: 0106d613 srl a2,a3,0x10 10334: 00e7b733 sltu a4,a5,a4 10338: 00d606b3 add a3,a2,a3 1033c: 00d70733 add a4,a4,a3 10340: 00f70733 add a4,a4,a5 10344: 07f77713 and a4,a4,127 10348: 04000513 li a0,64 1034c: 40e50533 sub a0,a0,a4 10350: 01051513 sll a0,a0,0x10 10354: 01055513 srl a0,a0,0x10 10358: 00008067 ret 0001035c <logp2>: 1035c: 04000793 li a5,64 10360: 40b785b3 sub a1,a5,a1 10364: 40a585b3 sub a1,a1,a0 10368: 00050713 mv a4,a0 1036c: 00000513 li a0,0 10370: 00b05e63 blez a1,1038c <logp2+0x30> 10374: 00150513 add a0,a0,1 10378: 01051513 sll a0,a0,0x10 1037c: 40e585b3 sub a1,a1,a4 10380: 01055513 srl a0,a0,0x10 10384: feb048e3 bgtz a1,10374 <logp2+0x18> 10388: 00008067 ret 1038c: 00008067 ret 000100b0 <main>: 100b0: ff010113 add sp,sp,-16 100b4: 00112623 sw ra,12(sp) 100b8: 00812423 sw s0,8(sp) 100bc: 00912223 sw s1,4(sp) 100c0: 118000ef jal 101d8 <get_instret> 100c4: 00050413 mv s0,a0 100c8: 0fc000ef jal 101c4 <get_cycles> 100cc: 00050493 mv s1,a0 100d0: 0f4000ef jal 101c4 <get_cycles> 100d4: 409505b3 sub a1,a0,s1 100d8: 0001c537 lui a0,0x1c 100dc: c1850513 add a0,a0,-1000 # 1bc18 <__clzsi2+0x6e> 100e0: 68c000ef jal 1076c <printf> 100e4: 0001c537 lui a0,0x1c 100e8: 00040593 mv a1,s0 100ec: c2c50513 add a0,a0,-980 # 1bc2c <__clzsi2+0x82> 100f0: 67c000ef jal 1076c <printf> 100f4: 0001c537 lui a0,0x1c 100f8: 04000613 li a2,64 100fc: 00000693 li a3,0 10100: c3c50513 add a0,a0,-964 # 1bc3c <__clzsi2+0x92> 10104: 668000ef jal 1076c <printf> 10108: 0001c537 lui a0,0x1c 1010c: 00600593 li a1,6 10110: c5450513 add a0,a0,-940 # 1bc54 <__clzsi2+0xaa> 10114: 658000ef jal 1076c <printf> 10118: 00c12083 lw ra,12(sp) 1011c: 00812403 lw s0,8(sp) 10120: 00412483 lw s1,4(sp) 10124: 00000513 li a0,0 10128: 01010113 add sp,sp,16 1012c: 00008067 ret ``` ### elf size ![](https://hackmd.io/_uploads/r1KSJx2-a.png) ### execute ![](https://hackmd.io/_uploads/HycLkgnZa.png) * Observation: * Line of code : `140` * Allocate `16` bytes on stack * Registers used : `$ra`, `$sp` , `$s0~$s1`, `$a0~$a5` * Number of `lw` and `sw` : `3` and `3` * execute output: * cycle count : 7 * instret : 2c5 ## -o handwrite Assembly Code ``` The code detail please check github file "handwrite.S" ``` ### details We need th check the asm-hello dirtionary to check the result of handwrite assembly code. And add some assembly code to execute. following list the some details. ``` .global _start .set STDOUT, 1 .set SYSEXIT, 93 .set SYSWRITE, 64 --------------------------- start: jal get_cycles addi sp, sp, -4 sw a0, 0(sp) ...skip main things... li a7, SYSWRITE li a0, 1 la a1, str_cycle li a2, 13 ecall jal get_cycles lw t0, 0(sp) # t0 = pre cycle sub a0, a0, t0 # a0 = new cycle addi sp, sp, 4 li a1, 4 jal print_ascii mv t0, a0 li a0, 1 la a1, buffer li a2, 4 li a7, SYSWRITE ecall li a7, SYSWRITE li a0, 1 la a1, endl li a2, 2 ecall li a7, SYSEXIT # "exit" syscall add a0, x0, 0 # Use 0 return code ecall # invoke syscall to terminate the program ------------------------ get_cycles: csrr a1, cycleh csrr a3, cycle csrr a2, cycleh bne a1, a2, get_cycles ret print_ascii: mv t0, a0 # load integer li t1, 0 # t1 = quotient li t2, 0 # t2 = reminder li t3, 10 # t3 = divisor mv t4, a1 # t4 = count roun check_less_then_ten: bge t0, t3, divide mv t2, t0 mv t0, t1 # t0 = quotient j to_ascii divide: sub t0, t0, t3 addi t1, t1, 1 j check_less_then_ten to_ascii: addi t2, t2, 48 # reminder to ascii la t5, buffer # t5 = buffer addr addi t4, t4, -1 add t5, t5, t4 sb t2, 0(t5) # counter = 0 exit beqz t4, convert_loop_done li t1, 0 # refresh quotient j check_less_then_ten convert_loop_done: retS ``` <s> ### elf size ![image.png](https://hackmd.io/_uploads/HJ6cMfgma.png) ### execute ![image.png](https://hackmd.io/_uploads/ByBd4fx7a.png) </s> :::warning :warning: Don't put the screenshots which contain plain text only. Instead, utilize HackMD syntax to annotate the text. :notes: jserv ::: ## Conclusion * -O1 to -O2, using less Line of code, S register and less use lw/sw. * -O1 to -O2 is reduce a lot of cycle count : 48->7 . * -Os have most cycle count : 54 . * In this case, both -O2 and -Ofast are almost identical. * handwrite version also speed up a lot cycle count : 13 * So, this is faster than others besides O2 :::warning TODO: Revise the handwritten RISC-V assembly code. :notes: jserv :::