# Assignment2: RISC-V Toolchain contributed by < [`ChengChiTing`](https://github.com/ChengChiTing) > ###### tags: `RISC-V`, `jserv` ## Install rv32emu on Ubuntu When I follow the instructions in [Lab2: RISC-V RV32I[MACF] emulator with ELF support](https://hackmd.io/@sysprog/SJAR5XMmi),some error happened during "make" step. The Terminal response the message ``` /bin/sh: 1: cc: not found /bin/sh: 1: cc: not found check the file build/.config for configured items. /bin/sh: 1: cc: not found /bin/sh: 1: cc: not found cc build/map.o make: cc: Command not found make *** [Makefile:142: build/map.o] Error 127 ``` After googling, I found we can use this command to solve the problem ``` sudo apt-get install build-essential ``` After `make` and `make check` successed, rv32emu is installed on Ubuntu. >note! Everytime reset the Terminal, we have to execute the following command again : >$ cd $HOME >$ source riscv-none-elf-gcc/setenv ## Rewrite "A Novel Approach to IP Routing: CLZ for Prefix Match" I choose the subject from [范紘維](https://hackmd.io/@gV8IONkMS_a6aHt20QNuAg/S1OSux_ga) , because I also use clz in my first homework and I think I can use some method to optimize this solution ( e.g. use iterative or byte shift ) . Furthermore, I am interested in how to check the IP routing with clz. ### C code implementation The C code below is the original code contributed by <[fan1071221](https://github.com/fan1071221/Computer_Architecture_2023_hw1)> :::spoiler C code ```c= #include <stdio.h> #include <stdint.h> typedef struct { uint32_t ip_prefix; uint8_t prefix_length; uint16_t leading_zeros; } RouteEntry; uint16_t count_leading_zeros(uint32_t x) { if (x == 0) return 32; int n = 1; if ((x >> 16) == 0) { n += 16; x <<= 16; } if ((x >> 24) == 0) { n += 8; x <<= 8; } if ((x >> 28) == 0) { n += 4; x <<= 4; } if ((x >> 30) == 0) { n += 2; x <<= 2; } n = n - (x >> 31); return n; } int is_prefix_match(uint32_t target_ip, uint32_t prefix, uint8_t prefix_length) { uint32_t mask = ~((1 << (32 - prefix_length)) - 1); return (target_ip & mask) == prefix; } void find_best_match(uint32_t target_ip, RouteEntry* routes, int num_routes) { int best_match_index = -1; uint16_t max_leading_zeros = 0; printf("Target IP: %u.%u.%u.%u\n", (target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF); printf("CLZ values for each route:\n"); for (int i = 0; i < num_routes; i++) { if (is_prefix_match(target_ip, routes[i].ip_prefix, routes[i].prefix_length)) { uint32_t xor_result = target_ip ^ routes[i].ip_prefix; routes[i].leading_zeros = count_leading_zeros(xor_result); printf("Route %u.%u.%u.%u/%u: CLZ = %u\n", (routes[i].ip_prefix >> 24) & 0xFF, (routes[i].ip_prefix >> 16) & 0xFF, (routes[i].ip_prefix >> 8) & 0xFF, routes[i].ip_prefix & 0xFF, routes[i].prefix_length, routes[i].leading_zeros); if (routes[i].leading_zeros > max_leading_zeros) { max_leading_zeros = routes[i].leading_zeros; best_match_index = i; } } } if (best_match_index != -1) { printf("Best matching route for IP %u.%u.%u.%u is: %u.%u.%u.%u/%u\n\n", (target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF, (routes[best_match_index].ip_prefix >> 24) & 0xFF, (routes[best_match_index].ip_prefix >> 16) & 0xFF, (routes[best_match_index].ip_prefix >> 8) & 0xFF, routes[best_match_index].ip_prefix & 0xFF, routes[best_match_index].prefix_length); } else { printf("No matching route found for IP %u.%u.%u.%u\n\n", (target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF); } } int main() { RouteEntry routes[] = { {0x0A000000, 8, 0}, // 10.0.0.0/8 {0x0A010000, 16, 0}, // 10.1.0.0/16 {0x0A010100, 24, 0}, // 10.1.1.0/24 {0xC0A80000, 16, 0} // 192.168.0.0/16 }; int num_routes = sizeof(routes) / sizeof(routes[0]); uint32_t ips[] = { 0x0A010137, // 10.1.1.55 0x0A0000FF, // 10.0.0.255 0xC0A80001 // 192.168.0.1 }; int num_ips = sizeof(ips) / sizeof(ips[0]); for (int i = 0; i < num_ips; i++) { find_best_match(ips[i], routes, num_routes); } return 0; } ``` ::: #### Rewrite Count Leading Zeros I rewrite the count leading zero function in byte shift, and the code is shown below. ```c = uint16_t count_leading_zeros(uint32_t x) { if (x == 0) return 32; int n = 1; if ((x >> 16) == 0) { n += 16; x <<= 16; } if ((x >> 24) == 0) { n += 8; x <<= 8; } if ((x >> 28) == 0) { n += 4; x <<= 4; } if ((x >> 30) == 0) { n += 2; x <<= 2; } n = n - (x >> 31); return n; } ``` After we excute the code in C compiler, we can obtain the same result. ``` Target IP: 10.1.1.55 CLZ values for each route: Route 10.0.0.0/8: CLZ = 15 Route 10.1.0.0/16: CLZ = 23 Route 10.1.1.0/24: CLZ = 26 Best matching route for IP 10.1.1.55 is: 10.1.1.0/24 Target IP: 10.0.0.255 CLZ values for each route: Route 10.0.0.0/8: CLZ = 24 Best matching route for IP 10.0.0.255 is: 10.0.0.0/8 Target IP: 192.168.0.1 CLZ values for each route: Route 192.168.0.0/16: CLZ = 31 Best matching route for IP 192.168.0.1 is: 192.168.0.0/16 ``` ### Assembly code implement in Ripes I also rewrite the assembly code in byte shift version, and the code is shown below. :::spoiler Assembly code ```c .data routes: .word 0x0A000000, 8, 0 .word 0x0A010000, 16, 0 .word 0x0A010100, 24, 0 .word 0xC0A80000, 16, 0 ips: .word 0x0A010137 .word 0x0A0000FF .word 0xC0A80001 num_routes: .word 4 num_ips: .word 3 newline: .string "\n" best_match_ip: .word 0 best_match_prefix: .word 0 dot: .string "." slash: .string "/" .text .globl main main: # Load base addresses of routes, ips, num_routes, and num_ips into registers la a1, routes la t1, ips la t0, num_routes lw a2, 0(t0) la t0, num_ips lw a3, 0(t0) # Loop through each IP and find the best match ip_loop: beqz a3, exit # If no more IPs, exit the loop lw a0, 0(t1) # ips Load the current IP into a0 j find_best_match # Call the find_best_match function exit: li a7, 10 ecall count_leading_zeros: addi a4, zero, 1 srli s2, t2, 16 bne s2, zero, bs24 addi a4, a4, 16 slli t2, t2, 16 bs24: srli s2, t2, 24 bne s2, zero, bs28 addi a4, a4, 8 slli t2, t2, 8 bs28: srli s2, t2, 28 bne s2, zero, bs30 addi a4, a4, 4 slli t2, t2, 4 bs30: srli s2, t2, 30 bne s2, zero, bs31 addi a4, a4, 2 slli t2, t2, 2 bs31: srli t2, t2, 31 sub a4, a4, t2 ret is_prefix_match: # a0: target_ip # a1: prefix # a4: prefix_length # Calculate the mask based on prefix_length li t2, 32 # Load 32 into t2 sub t2, t2, a4 # Subtract prefix_length from 32 li t3, 1 # Load 1 into t3 sll t3, t3, t2 # Shift left 1 by the result of the subtraction addi t3, t3, -1 # Subtract 1 from the result to get a mask with leading zeros not t2, t3 # Bitwise NOT to get the mask # Apply the mask to target_ip and t2, a0, t2 # AND operation between target_ip and mask # Compare the result with prefix li a5, 0 # Set return value to 0 (no match) beq t2, t0, match # If they are equal, it's a match ret match: li a5, 1 # Set return value to 1 (match) ret find_best_match: # a0: target_ip # a1: base address of routes # a2: num_routes # Initialize best_match_index, max_leading_zeros, and route_index li t4, -1 # t4 will hold best_match_index li t5, 0 # t5 will hold max_leading_zeros li t6, 0 # t6 will hold route_index (number of routes traversed) li s3, 0 # s3 will hold best_match_ip li s4, 0 # s4 will hold best_match_prefix loop_routes: beqz a2, end_loop # If no more routes, end the loop # Load prefix and prefix_length from routes lw t0, 0(a1) # t0 holds ip_prefix lbu a4, 4(a1) # a3 holds prefix_length # Check if prefix matches jal ra, is_prefix_match beqz a5, skip_route # If no match, skip to next route # Calculate leading zeros xor t2, a0, t0 # XOR target_ip and ip_prefix jal ra, count_leading_zeros # Check if current route has more leading zeros than previous best blt t5, a4, update_best # If you reach here, it means the route matched but was not the best match # So, you can skip to the next route j skip_route skip_route: addi a1, a1, 12 # Move to the next route entry addi a2, a2, -1 # Decrement num_routes addi t6, t6, 1 # Increment route_index j loop_routes update_best: mv t5, a4 # Update max_leading_zeros mv t4, t6 # Update best_match_index with current route_index lw s3, 0(a1) # Save the best match IP from routes lbu s4, 4(a1) # Save the best match prefix from routes j skip_route print_ip: # s3 contains the IP to be printed # Extract and print the first octet srli s5, s3, 24 andi s5, s5, 0xFF mv a0, s5 li a7, 1 ecall # Print a dot la a0, dot li a7, 4 ecall # Extract and print the second octet srli s5, s3, 16 andi s5, s5, 0xFF mv a0, s5 li a7, 1 ecall la a0, dot # Print a dot li a7, 4 ecall srli s5, s3, 8 # Extract and print the third octet andi s5, s5, 0xFF mv a0, s5 li a7, 1 ecall # Print a dot la a0, dot li a7, 4 ecall andi s5, s3, 0xFF # Extract and print the fourth octet mv a0, s5 li a7, 1 ecall ret end_loop: mv a0, s3 # Print the best match IP jal ra, print_ip la a0, slash li a7, 4 ecall mv a0, s4 # Print the best match prefix li a7, 1 ecall la a0, newline li a7, 4 ecall li a2, 4 addi t1, t1, 4 # Move to the next IP addi a3, a3, -1 # Decrement the IP count la a1, routes j ip_loop ``` ::: #### Comparing original code and byte shift version in Ripes | Original | Byte shift | | -------- | ---------- | |![](https://hackmd.io/_uploads/BJaXJU5Mp.png)|![](https://hackmd.io/_uploads/B1Jl0Sqza.png)| ## Optimized by riscv-none-elf-gcc ### Makefile Before compilering the C code, we have to modify `Makefile` for our file first. I refered to the `Makefile` in chacha20 folder and rewrite it. The code shown below is my own `Makefile`. The `Makefile` tells `make` insruction how to compile and link a program. ``` .PHONY: clean include ../../../mk/toolchain.mk CFLAGS = -march=rv32i -mabi=ilp32 -Os OBJS = ipcheck.o BIN = ipcheck.elf %.o: %.s $(CROSS_COMPILE)gcc $(CFLAGS) -c -o $@ $< %.o: %.c $(CROSS_COMPILE)gcc $(CFLAGS) -c -o $@ $< all: $(BIN) $(BIN): $(OBJS) $(CROSS_COMPILE)gcc -o $@ $< clean: $(RM) $(BIN) $(OBJS) ``` After we excute `make` command, it will compiler C file (or Assembly file) to object file first, and then build the ELF file. If the compilation is sucessed, we wil get the following response. ``` riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -Os -c -o ipcheck.o ipcheck.s riscv-none-elf-gc -o ipcheck.elf ipcheck.o ``` ### Using GNU Toolchain Run ipcheck.elf using the command line below: ``` $ ../../build/rv32emu ipcheck.elf ``` Then we can check the ELF file by following three instructions. 1. [objdump](https://man7.org/linux/man-pages/man1/objdump.1.html) >-d : Display the assembler mnemonics for the machine instructions ``` $ riscv-none-elf-objdump -d ipcheck.elf ``` We will get the machine instructions below. :::spoiler Disassembly code ``` ipcheck.elf: file format elf32-littleriscv Disassembly of section .text: 00010094 <exit>: 10094: 1141 add sp,sp,-16 10096: 4581 li a1,0 10098: c422 sw s0,8(sp) 1009a: c606 sw ra,12(sp) 1009c: 842a mv s0,a0 1009e: 2bf000ef jal 10b5c <__call_exitprocs> 100a2: da01a783 lw a5,-608(gp) # 125b0 <__stdio_exit_handler> 100a6: c391 beqz a5,100aa <exit+0x16> 100a8: 9782 jalr a5 100aa: 8522 mv a0,s0 100ac: 023010ef jal 118ce <_exit> 000100b0 <register_fini>: 100b0: 00000793 li a5,0 100b4: c791 beqz a5,100c0 <register_fini+0x10> 100b6: 00001517 auipc a0,0x1 100ba: 91c50513 add a0,a0,-1764 # 109d2 <__libc_fini_array> 100be: a4ed j 103a8 <atexit> 100c0: 8082 ret 000100c2 <_start>: 100c2: 00002197 auipc gp,0x2 100c6: 74e18193 add gp,gp,1870 # 12810 <__global_pointer$> 100ca: d9c18513 add a0,gp,-612 # 125ac <completed.1> 100ce: 0e418613 add a2,gp,228 # 128f4 <__BSS_END__> 100d2: 8e09 sub a2,a2,a0 100d4: 4581 li a1,0 100d6: 14d000ef jal 10a22 <memset> 100da: 00001517 auipc a0,0x1 100de: 8f850513 add a0,a0,-1800 # 109d2 <__libc_fini_array> 100e2: 24d9 jal 103a8 <atexit> 100e4: 085000ef jal 10968 <__libc_init_array> 100e8: 4502 lw a0,0(sp) 100ea: 004c add a1,sp,4 100ec: 4601 li a2,0 100ee: 2899 jal 10144 <main> 100f0: b755 j 10094 <exit> 000100f2 <__do_global_dtors_aux>: 100f2: 1141 add sp,sp,-16 100f4: c422 sw s0,8(sp) 100f6: d9c18413 add s0,gp,-612 # 125ac <completed.1> 100fa: 00044783 lbu a5,0(s0) 100fe: c606 sw ra,12(sp) 10100: ef99 bnez a5,1011e <__do_global_dtors_aux+0x2c> 10102: 00000793 li a5,0 10106: cb89 beqz a5,10118 <__do_global_dtors_aux+0x26> 10108: 00002517 auipc a0,0x2 1010c: ef850513 add a0,a0,-264 # 12000 <__EH_FRAME_BEGIN__> 10110: 00000097 auipc ra,0x0 10114: 000000e7 jalr zero # 0 <str_dot_len-0x1> 10118: 4785 li a5,1 1011a: 00f40023 sb a5,0(s0) 1011e: 40b2 lw ra,12(sp) 10120: 4422 lw s0,8(sp) 10122: 0141 add sp,sp,16 10124: 8082 ret 00010126 <frame_dummy>: 10126: 00000793 li a5,0 1012a: cb99 beqz a5,10140 <frame_dummy+0x1a> 1012c: ddc18593 add a1,gp,-548 # 125ec <object.0> 10130: 00002517 auipc a0,0x2 10134: ed050513 add a0,a0,-304 # 12000 <__EH_FRAME_BEGIN__> 10138: 00000317 auipc t1,0x0 1013c: 00000067 jr zero # 0 <str_dot_len-0x1> 10140: 8082 ret ... ``` ::: 2. [readelf](https://man7.org/linux/man-pages/man1/readelf.1.html) >-h : Display the ELF file header ``` $ riscv-none-elf-readelf -h ipcheck.elf ``` We wil get the following output: ``` ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x100c2 Start of program headers: 52 (bytes into file) Start of section headers: 16304 (bytes into file) Flags: 0x1 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 14 Section header string table index: 13 ``` 3. [size](https://man7.org/linux/man-pages/man1/size.1.html) >list the section sizes—and the total size—for each of the object or archive files objfile in its argument list. ``` $ riscv-none-elf-size ipcheck.elf ``` We wil get the following output: ``` text data bss dec hex filename 6444 1452 840 8736 2220 ipcheck.elf ``` Furthermore, we can check our register usage and instruction frequency by following two commands. >We have to use `make tool` commond in `rv32emu` folder, then we can use the `rv_histogram` command. 1. RV32 Target Instruction Frequency Histogram ``` $ ../../build/rv_histogram -a ipcheck.elf ``` 2. RV32 Target Register Usage Histogram ``` $ ../../build/rv_histogram -r ipcheck.elf ``` ## Performance Analysis We can add the [ticks.c](https://github.com/sysprog21/rv32emu/blob/master/tests/ticks.c) into our C code thus we can analyze the cycles. :::spoiler C code with ticks ```c= #include <inttypes.h> #include <stdint.h> #include <stdio.h> typedef uint64_t ticks; static inline ticks getticks(void) { uint64_t result; uint32_t l, h, h2; asm volatile( "rdcycleh %0\n" "rdcycle %1\n" "rdcycleh %2\n" "sub %0, %0, %2\n" "seqz %0, %0\n" "sub %0, zero, %0\n" "and %1, %1, %0\n" : "=r"(h), "=r"(l), "=r"(h2)); result = (((uint64_t) h) << 32) | ((uint64_t) l); return result; } typedef struct { uint32_t ip_prefix; uint8_t prefix_length; uint16_t leading_zeros; } RouteEntry; uint16_t count_leading_zeros(uint32_t x) { if (x == 0) return 32; int n = 1; if ((x >> 16) == 0) { n += 16; x <<= 16; } if ((x >> 24) == 0) { n += 8; x <<= 8; } if ((x >> 28) == 0) { n += 4; x <<= 4; } if ((x >> 30) == 0) { n += 2; x <<= 2; } n = n - (x >> 31); return n; } int is_prefix_match(uint32_t target_ip, uint32_t prefix, uint8_t prefix_length) { uint32_t mask = ~((1 << (32 - prefix_length)) - 1); return (target_ip & mask) == prefix; } void find_best_match(uint32_t target_ip, RouteEntry* routes, int num_routes) { int best_match_index = -1; uint16_t max_leading_zeros = 0; printf("Target IP: %u.%u.%u.%u\n", (target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF); printf("CLZ values for each route:\n"); for (int i = 0; i < num_routes; i++) { if (is_prefix_match(target_ip, routes[i].ip_prefix, routes[i].prefix_length)) { uint32_t xor_result = target_ip ^ routes[i].ip_prefix; routes[i].leading_zeros = count_leading_zeros(xor_result); printf("Route %u.%u.%u.%u/%u: CLZ = %u\n", (routes[i].ip_prefix >> 24) & 0xFF, (routes[i].ip_prefix >> 16) & 0xFF, (routes[i].ip_prefix >> 8) & 0xFF, routes[i].ip_prefix & 0xFF, routes[i].prefix_length, routes[i].leading_zeros); if (routes[i].leading_zeros > max_leading_zeros) { max_leading_zeros = routes[i].leading_zeros; best_match_index = i; } } } if (best_match_index != -1) { printf("Best matching route for IP %u.%u.%u.%u is: %u.%u.%u.%u/%u\n\n", (target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF, (routes[best_match_index].ip_prefix >> 24) & 0xFF, (routes[best_match_index].ip_prefix >> 16) & 0xFF, (routes[best_match_index].ip_prefix >> 8) & 0xFF, routes[best_match_index].ip_prefix & 0xFF, routes[best_match_index].prefix_length); } else { printf("No matching route found for IP %u.%u.%u.%u\n\n", (target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF); } } int main() { RouteEntry routes[] = { {0x0A000000, 8, 0}, // 10.0.0.0/8 {0x0A010000, 16, 0}, // 10.1.0.0/16 {0x0A010100, 24, 0}, // 10.1.1.0/24 {0xC0A80000, 16, 0} // 192.168.0.0/16 }; int num_routes = sizeof(routes) / sizeof(routes[0]); uint32_t ips[] = { 0x0A010137, // 10.1.1.55 0x0A0000FF, // 10.0.0.255 0xC0A80001 // 192.168.0.1 }; int num_ips = sizeof(ips) / sizeof(ips[0]); ticks t0 = getticks(); for (int i = 0; i < num_ips; i++) { find_best_match(ips[i], routes, num_routes); } ticks t1 = getticks(); printf("elapsed cycle: %" PRIu64 "\n", t1 - t0); return 0; } ``` ::: ### -O0 Optimized Result ELF file header ``` ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x100c2 Start of program headers: 52 (bytes into file) Start of section headers: 69436 (bytes into file) Flags: 0x1, RVC, soft-float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 15 Section header string table index: 14 ``` ELF file size ``` text data bss dec hex filename 52966 1876 1528 56370 dc32 ipcheck.elf ``` Elapsed cycle : 46375 Instruction Frequency Histogram : ![](https://hackmd.io/_uploads/rJgLWfpMa.png) Target Register Usage Histogram : ![](https://hackmd.io/_uploads/rJxb_bMTzT.png) ### -O1 Optimized Result ELF file header ``` ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x100c2 Start of program headers: 52 (bytes into file) Start of section headers: 69436 (bytes into file) Flags: 0x1, RVC, soft-float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 15 Section header string table index: 14 ``` ELF file size ``` text data bss dec hex filename 52274 1876 1528 55678 d97e ipcheck.elf ``` Elapsed cycle : 45408 Instruction Frequency Histogram : ![](https://hackmd.io/_uploads/BJ3WEm6z6.png) Target Register Usage Histogram : ![](https://hackmd.io/_uploads/BkbGEX6z6.png) ### -O2 Optimized Result ELF file header ``` ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x100b2 Start of program headers: 52 (bytes into file) Start of section headers: 69452 (bytes into file) Flags: 0x1, RVC, soft-float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 15 Section header string table index: 14 ``` ELF file size ``` text data bss dec hex filename 52306 1876 1528 55710 d99e ipcheck.elf ``` Elapsed cycle : 45403 Instruction Frequency Histogram : ![](https://hackmd.io/_uploads/HyeE476za.png) Target Register Usage Histogram : ![](https://hackmd.io/_uploads/Bk9VN7pz6.png) ### -O3 Optimized Result ELF file header ``` ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x1061a Start of program headers: 52 (bytes into file) Start of section headers: 69860 (bytes into file) Flags: 0x1, RVC, soft-float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 15 Section header string table index: 14 ``` ELF file size ``` text data bss dec hex filename 53528 1876 1528 56932 de64 ipcheck.elf ``` Elapsed cycle : 45174 Instruction Frequency Histogram : ![](https://hackmd.io/_uploads/Hk6S47TzT.png) Target Register Usage Histogram : ![](https://hackmd.io/_uploads/B1GLNmaz6.png) ### -Ofast Optimized Result ELF file header ``` ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x1061a Start of program headers: 52 (bytes into file) Start of section headers: 69860 (bytes into file) Flags: 0x1, RVC, soft-float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 15 Section header string table index: 14 ``` ELF file size ``` text data bss dec hex filename 53528 1876 1528 56932 de64 ipcheck.elf ``` Elapsed cycle : 45174 Instruction Frequency Histogram : ![](https://hackmd.io/_uploads/S1twEQTfa.png) Target Register Usage Histogram : ![](https://hackmd.io/_uploads/Hked4Xpfa.png) ### -Os Optimized Result ELF file header ``` ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x1017e Start of program headers: 52 (bytes into file) Start of section headers: 69452 (bytes into file) Flags: 0x1, RVC, soft-float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 15 Section header string table index: 14 ``` ELF file size ``` text data bss dec hex filename 52248 1876 1528 55652 d964 ipcheck.elf ``` Elapsed cycle : 45438 Instruction Frequency Histogram : ![](https://hackmd.io/_uploads/S1TK4mpzT.png) Target Register Usage Histogram : ![](https://hackmd.io/_uploads/BJH9VX6M6.png) ### Handwriting Assembly code [Rv32emu](https://github.com/sysprog21/rv32emu) and [Ripes](https://github.com/mortbopet/Ripes) may not work together, therefore we have to rewrite the assembly code before we compiler it. We can check the [docs/syscall.md](https://github.com/sysprog21/rv32emu/blob/master/docs/syscall.md) and [src/syscall.c](https://github.com/sysprog21/rv32emu/blob/master/src/syscall.c) in advance. For example: We have to change `exit` system call from ``` li a7, 10 # exit program ecall ``` into ``` li a0, 0 li a7, 93 # exit program ecall ``` The code below is my rewrited assembly code : :::spoiler Rewrite assembly code ```c .org 0 .global main # New syscall .set SYSEXIT, 93 .set SYSWRITE, 64 .data routes: .word 0x0A000000, 8, 0 .word 0x0A010000, 16, 0 .word 0x0A010100, 24, 0 .word 0xC0A80000, 16, 0 ips: .word 0x0A010137 .word 0x0A0000FF .word 0xC0A80001 num_routes: .word 4 num_ips: .word 3 nextline: .ascii "\n" .set str_next_len, .-nextline best_match_ip: .word 0 best_match_prefix: .word 0 dot: .ascii "." .set str_dot_len, .-dot slash: .ascii "/" .set str_slash_len, .-slash .text main: # Load base addresses of routes, ips, num_routes, and num_ips into registers la a1, routes la t1, ips la t0, num_routes lw a2, 0(t0) la t0, num_ips lw a3, 0(t0) # Loop through each IP and find the best match ip_loop: beqz a3, exit # If no more IPs, exit the loop lw a0, 0(t1) # ips Load the current IP into a0 j find_best_match # Call the find_best_match function exit: li a7, SYSEXIT addi a0, x0, 0 ecall count_leading_zeros: addi a4, zero, 1 srli s2, t2, 16 bne s2, zero, bs24 addi a4, a4, 16 slli t2, t2, 16 bs24: srli s2, t2, 24 bne s2, zero, bs28 addi a4, a4, 8 slli t2, t2, 8 bs28: srli s2, t2, 28 bne s2, zero, bs30 addi a4, a4, 4 slli t2, t2, 4 bs30: srli s2, t2, 30 bne s2, zero, bs31 addi a4, a4, 2 slli t2, t2, 2 bs31: srli t2, t2, 31 sub a4, a4, t2 ret is_prefix_match: # a0: target_ip # a1: prefix # a4: prefix_length # Calculate the mask based on prefix_length li t2, 32 # Load 32 into t2 sub t2, t2, a4 # Subtract prefix_length from 32 li t3, 1 # Load 1 into t3 sll t3, t3, t2 # Shift left 1 by the result of the subtraction addi t3, t3, -1 # Subtract 1 from the result to get a mask with leading zeros not t2, t3 # Bitwise NOT to get the mask # Apply the mask to target_ip and t2, a0, t2 # AND operation between target_ip and mask # Compare the result with prefix li a5, 0 # Set return value to 0 (no match) beq t2, t0, match # If they are equal, it's a match ret match: li a5, 1 # Set return value to 1 (match) ret find_best_match: # a0: target_ip # a1: base address of routes # a2: num_routes # Initialize best_match_index, max_leading_zeros, and route_index li t4, -1 # t4 will hold best_match_index li t5, 0 # t5 will hold max_leading_zeros li t6, 0 # t6 will hold route_index (number of routes traversed) li s3, 0 # s3 will hold best_match_ip li s4, 0 # s4 will hold best_match_prefix loop_routes: beqz a2, end_loop # If no more routes, end the loop # Load prefix and prefix_length from routes lw t0, 0(a1) # t0 holds ip_prefix lbu a4, 4(a1) # a3 holds prefix_length # Check if prefix matches jal ra, is_prefix_match beqz a5, skip_route # If no match, skip to next route # Calculate leading zeros xor t2, a0, t0 # XOR target_ip and ip_prefix jal ra, count_leading_zeros # Check if current route has more leading zeros than previous best blt t5, a4, update_best # If you reach here, it means the route matched but was not the best match # So, you can skip to the next route j skip_route skip_route: addi a1, a1, 12 # Move to the next route entry addi a2, a2, -1 # Decrement num_routes addi t6, t6, 1 # Increment route_index j loop_routes update_best: mv t5, a4 # Update max_leading_zeros mv t4, t6 # Update best_match_index with current route_index lw s3, 0(a1) # Save the best match IP from routes lbu s4, 4(a1) # Save the best match prefix from routes j skip_route print_ip: # s3 contains the IP to be printed # Extract and print the first octet srli s5, s3, 24 andi s5, s5, 0xFF addi sp, sp, -16 sw s5, 0(sp) sw a1, 4(sp) sw a2, 8(sp) sw ra, 12(sp) addi a1, sp, 0 li a7, SYSWRITE li a0, 1 li a2, 4 ecall # Print a dot li a7, SYSWRITE li a0, 1 la a1, dot li a2, str_dot_len ecall # Extract and print the second octet srli s5, s3, 16 andi s5, s5, 0xFF sw s5, 0(sp) addi a1, sp, 0 li a7, SYSWRITE li a0, 1 li a2, 4 ecall li a7, SYSWRITE li a0, 1 la a1, dot li a2, str_dot_len ecall srli s5, s3, 8 # Extract and print the third octet andi s5, s5, 0xFF sw s5, 0(sp) addi a1, sp, 0 li a7, SYSWRITE li a0, 1 li a2, 4 ecall # Print a dot li a7, SYSWRITE li a0, 1 la a1, dot li a2, str_dot_len ecall andi s5, s3, 0xFF # Extract and print the fourth octet sw s5, 0(sp) addi a1, sp, 0 li a7, SYSWRITE li a0, 1 li a2, 4 ecall lw a1, 4(sp) lw a2, 8(sp) lw ra, 12(sp) addi sp, sp, 16 ret end_loop: mv a0, s3 # Print the best match IP jal ra, print_ip addi sp, sp, -12 sw a1, 4(sp) sw a2, 8(sp) li a7, SYSWRITE li a0, 1 la a1, slash li a2, str_slash_len ecall sw s4, 0(sp) addi a1, sp, 0 li a7, SYSWRITE li a0, 1 li a2, 4 ecall li a7, SYSWRITE li a0, 1 la a1, nextline li a2, str_next_len ecall lw a1, 4(sp) lw a2, 8(sp) addi sp, sp, 12 li a2, 4 addi t1, t1, 4 # Move to the next IP addi a3, a3, -1 # Decrement the IP count la a1, routes j ip_loop ``` ::: But there are some trouble with displaying numbers when I excute the program. The numbers can't display correctly. ``` .../ .../ . ../ inferior exit code 0 ``` After googling, this [article](https://hackmd.io/@wanghanchi/rkOjWYqBj#srv32-%E4%B8%AD%E7%B7%A8%E5%AF%AB-assembly-code-%E6%B3%A8%E6%84%8F%E4%BA%8B%E9%A0%85) provide some methods to solve this problem. We can use `printf` commond to display integer. Thus, I rewrite the assembly code into printf version again. :::spoiler Printf version ```c .org 0 .global main # New syscall .set SYSEXIT, 93 .set SYSWRITE, 64 .data routes: .word 0x0A000000, 8, 0 .word 0x0A010000, 16, 0 .word 0x0A010100, 24, 0 .word 0xC0A80000, 16, 0 ips: .word 0x0A010137 .word 0x0A0000FF .word 0xC0A80001 iformat: .string "%d" num_routes: .word 4 num_ips: .word 3 nextline: .ascii "\n" .set str_next_len, .-nextline best_match_ip: .word 0 best_match_prefix: .word 0 dot: .ascii "." .set str_dot_len, .-dot slash: .ascii "/" .set str_slash_len, .-slash .text main: # Load base addresses of routes, ips, num_routes, and num_ips into registers la a1, routes la t1, ips la t0, num_routes lw a2, 0(t0) la t0, num_ips lw a3, 0(t0) # Loop through each IP and find the best match ip_loop: beqz a3, exit # If no more IPs, exit the loop lw a0, 0(t1) # ips Load the current IP into a0 j find_best_match # Call the find_best_match function exit: li a7, SYSEXIT addi a0, x0, 0 ecall count_leading_zeros: addi a4, zero, 1 srli s2, t2, 16 bne s2, zero, bs24 addi a4, a4, 16 slli t2, t2, 16 bs24: srli s2, t2, 24 bne s2, zero, bs28 addi a4, a4, 8 slli t2, t2, 8 bs28: srli s2, t2, 28 bne s2, zero, bs30 addi a4, a4, 4 slli t2, t2, 4 bs30: srli s2, t2, 30 bne s2, zero, bs31 addi a4, a4, 2 slli t2, t2, 2 bs31: srli t2, t2, 31 sub a4, a4, t2 ret is_prefix_match: # a0: target_ip # a1: prefix # a4: prefix_length # Calculate the mask based on prefix_length li t2, 32 # Load 32 into t2 sub t2, t2, a4 # Subtract prefix_length from 32 li t3, 1 # Load 1 into t3 sll t3, t3, t2 # Shift left 1 by the result of the subtraction addi t3, t3, -1 # Subtract 1 from the result to get a mask with leading zeros not t2, t3 # Bitwise NOT to get the mask # Apply the mask to target_ip and t2, a0, t2 # AND operation between target_ip and mask # Compare the result with prefix li a5, 0 # Set return value to 0 (no match) beq t2, t0, match # If they are equal, it's a match ret match: li a5, 1 # Set return value to 1 (match) ret find_best_match: # a0: target_ip # a1: base address of routes # a2: num_routes # Initialize best_match_index, max_leading_zeros, and route_index li t4, -1 # t4 will hold best_match_index li t5, 0 # t5 will hold max_leading_zeros li t6, 0 # t6 will hold route_index (number of routes traversed) li s3, 0 # s3 will hold best_match_ip li s4, 0 # s4 will hold best_match_prefix loop_routes: beqz a2, end_loop # If no more routes, end the loop # Load prefix and prefix_length from routes lw t0, 0(a1) # t0 holds ip_prefix lbu a4, 4(a1) # a3 holds prefix_length # Check if prefix matches jal ra, is_prefix_match beqz a5, skip_route # If no match, skip to next route # Calculate leading zeros xor t2, a0, t0 # XOR target_ip and ip_prefix jal ra, count_leading_zeros # Check if current route has more leading zeros than previous best blt t5, a4, update_best # If you reach here, it means the route matched but was not the best match # So, you can skip to the next route j skip_route skip_route: addi a1, a1, 12 # Move to the next route entry addi a2, a2, -1 # Decrement num_routes addi t6, t6, 1 # Increment route_index j loop_routes update_best: mv t5, a4 # Update max_leading_zeros mv t4, t6 # Update best_match_index with current route_index lw s3, 0(a1) # Save the best match IP from routes lbu s4, 4(a1) # Save the best match prefix from routes j skip_route print_ip: # s3 contains the IP to be printed # Extract and print the first octet srli s5, s3, 24 andi s5, s5, 0xFF addi sp, sp, -16 sw s5, 0(sp) sw a1, 4(sp) sw a2, 8(sp) sw ra, 12(sp) la a0, iformat lw a1, 0(sp) call printf # Print a dot li a7, SYSWRITE li a0, 1 la a1, dot li a2, str_dot_len ecall # Extract and print the second octet srli s5, s3, 16 andi s5, s5, 0xFF sw s5, 0(sp) la a0, iformat lw a1, 0(sp) call printf li a7, SYSWRITE li a0, 1 la a1, dot li a2, str_dot_len ecall srli s5, s3, 8 # Extract and print the third octet andi s5, s5, 0xFF sw s5, 0(sp) la a0, iformat lw a1, 0(sp) call printf # Print a dot li a7, SYSWRITE li a0, 1 la a1, dot li a2, str_dot_len ecall andi s5, s3, 0xFF # Extract and print the fourth octet sw s5, 0(sp) la a0, iformat lw a1, 0(sp) call printf lw a1, 4(sp) lw a2, 8(sp) lw ra, 12(sp) addi sp, sp, 16 ret end_loop: mv a0, s3 # Print the best match IP jal ra, print_ip addi sp, sp, -16 sw a1, 4(sp) sw a2, 8(sp) sw ra, 12(sp) li a7, SYSWRITE li a0, 1 la a1, slash li a2, str_slash_len ecall sw s4, 0(sp) la a0, iformat lw a1, 0(sp) call printf li a7, SYSWRITE li a0, 1 la a1, nextline li a2, str_next_len ecall lw a1, 4(sp) lw a2, 8(sp) lw ra, 12(sp) addi sp, sp, 16 li a2, 4 addi t1, t1, 4 # Move to the next IP addi a3, a3, -1 # Decrement the IP count la a1, routes j ip_loop ``` ::: However, it still can't print the result correctly. Rv32emu seems not support `printf` commond. Therefore, I decide to use the first assembly code to generate ELF file and analyze it. ELF file header ``` ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x100c2 Start of program headers: 52 (bytes into file) Start of section headers: 16304 (bytes into file) Flags: 0x1, RVC, soft-float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 14 Section header string table index: 13 ``` ELF file size ``` text data bss dec hex filename 6444 1452 840 8736 2220 ipcheck.elf ``` cycle count : 666 Instruction Frequency Histogram : ![](https://hackmd.io/_uploads/rJ0Ud46f6.png) Target Register Usage Histogram : ![](https://hackmd.io/_uploads/B1wvO4aMp.png) ### Analyze Comparing handwriting assemble with assembly code generated by riscv-none-elf-gcc. | | Handwriting | O0 | O1 | O2 | O3 | Os | Ofast | | -------- | ----------- | -------- | -------- | -------- | -------- | -------- | -------- | | Size | 6444 | 52966 | 52274 | 52306 | 53528 | 52248 | 53528 | We can notice there is a tremendous difference in size between handwriting assembly code and generated code. Furthermore, Os optimized has smallest among the optimization levels. Specifies the degree to which the generated code is optimized for speed and binary size. The different among the optimization level are shown below. 1. ```None[-O0]``` Do not optimize. With this setting, the compiler's goal is to reduce the cost of compilation and to make debugging produce the expected results. 2. ```Fast[-O1]``` Optimizing compilation takes somewhat more time, and a lot more memory for a large function. With this setting, the compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time 3. ```Faster[-O2]``` The compiler performs nearly all supported optimizations that do not involve a space-speed tradeoff. With this setting, the compiler does not perform loop unrolling or function inlining, or register renaming. As compared to the 'Fast' setting, this setting increases both compilation time and the performance of the generated code. 3. ```Fastest[-O3]``` Turns on all optimizations specified by the 'Faster' setting and also turns on function inlining and register renaming options. This setting may result in a larger binary. 4. ```Fastest, Smallest[-Os]``` Optimize for size. This setting enables all 'Faster' optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size. 5. ```Fastest, Aggressive Optimizations[-Ofast]``` This setting enables 'Fastest' but also enables aggressive optimizations that may break strict standards compliance but should work well on well-behaved code. We can also compare the cycles among different optimization level | | Handwriting | O0 | O1 | O2 | O3 | Os | Ofast | | -------- | ----------- | -------- | -------- | -------- | -------- | -------- | -------- | | cycles | 666 | 46375 | 45408 | 45403 | 45174 | 45438 | 45174 | We can notice that handwriting assembly code still use less cycles and Ofast optimized has smallest cycles among the optimization levels. However, as the optimization level decrease, it will use less cycles to excute the program. But the cycles of Os is worse than others. But there are only slight difference among the six optimization level Furthermore, I found whichever optimization level I chose, the text size and cycles of handwriting assembly code have always been the same. ### Other discovery I am interested in which C instruction spend the most cycles. After trying, I found `printf` spend the majority of the cycles. Thus, I decide to remove all `printf` in my C code and count the cycles. Without `printf` the cycles reduce obviously, and the results are shown below. | | O0 | O1 | O2 | O3 | Os | Ofast | | -------- | -------- | -------- | -------- | -------- | -------- | -------- | |Count cycles | 1135 | 356 | 279 | 7 | 429 | 7 | # Reference * [A Novel Approach to IP Routing: CLZ for Prefix Match](https://hackmd.io/@gV8IONkMS_a6aHt20QNuAg/S1OSux_ga) * [RISC-V RV32I[MACF] emulator with ELF support](https://hackmd.io/@sysprog/SJAR5XMmi) * [Makefile 語法和示範](https://hackmd.io/@sysprog/SySTMXPvl) * [rv32emu](https://github.com/sysprog21/rv32emu) * [srv32](https://hackmd.io/@wanghanchi/rkOjWYqBj#srv32-%E4%B8%AD%E7%B7%A8%E5%AF%AB-assembly-code-%E6%B3%A8%E6%84%8F%E4%BA%8B%E9%A0%85) * [Optimization Level](https://juejin.cn/post/6971278709873442852)