Try   HackMD

Assignment2: RISC-V Toolchain

contributed by < ChengChiTing >

tags: RISC-V, jserv

Install rv32emu on Ubuntu

When I follow the instructions in Lab2: RISC-V RV32I[MACF] emulator with ELF support,some error happened during "make" step. The Terminal response the message

/bin/sh: 1: cc: not found
/bin/sh: 1: cc: not found
check the file build/.config for configured items.
/bin/sh: 1: cc: not found
/bin/sh: 1: cc: not found
  cc    build/map.o
make: cc: Command not found
make *** [Makefile:142: build/map.o] Error 127

After googling, I found we can use this command to solve the problem

sudo apt-get install build-essential

After make and make check successed, rv32emu is installed on Ubuntu.

note! Everytime reset the Terminal, we have to execute the following command again :
$ cd $HOME
$ source riscv-none-elf-gcc/setenv

Rewrite "A Novel Approach to IP Routing: CLZ for Prefix Match"

I choose the subject from 范紘維 , because I also use clz in my first homework and I think I can use some method to optimize this solution ( e.g. use iterative or byte shift ) . Furthermore, I am interested in how to check the IP routing with clz.

C code implementation

The C code below is the original code contributed by <fan1071221>

C code
#include <stdio.h> #include <stdint.h> typedef struct { uint32_t ip_prefix; uint8_t prefix_length; uint16_t leading_zeros; } RouteEntry; uint16_t count_leading_zeros(uint32_t x) { if (x == 0) return 32; int n = 1; if ((x >> 16) == 0) { n += 16; x <<= 16; } if ((x >> 24) == 0) { n += 8; x <<= 8; } if ((x >> 28) == 0) { n += 4; x <<= 4; } if ((x >> 30) == 0) { n += 2; x <<= 2; } n = n - (x >> 31); return n; } int is_prefix_match(uint32_t target_ip, uint32_t prefix, uint8_t prefix_length) { uint32_t mask = ~((1 << (32 - prefix_length)) - 1); return (target_ip & mask) == prefix; } void find_best_match(uint32_t target_ip, RouteEntry* routes, int num_routes) { int best_match_index = -1; uint16_t max_leading_zeros = 0; printf("Target IP: %u.%u.%u.%u\n", (target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF); printf("CLZ values for each route:\n"); for (int i = 0; i < num_routes; i++) { if (is_prefix_match(target_ip, routes[i].ip_prefix, routes[i].prefix_length)) { uint32_t xor_result = target_ip ^ routes[i].ip_prefix; routes[i].leading_zeros = count_leading_zeros(xor_result); printf("Route %u.%u.%u.%u/%u: CLZ = %u\n", (routes[i].ip_prefix >> 24) & 0xFF, (routes[i].ip_prefix >> 16) & 0xFF, (routes[i].ip_prefix >> 8) & 0xFF, routes[i].ip_prefix & 0xFF, routes[i].prefix_length, routes[i].leading_zeros); if (routes[i].leading_zeros > max_leading_zeros) { max_leading_zeros = routes[i].leading_zeros; best_match_index = i; } } } if (best_match_index != -1) { printf("Best matching route for IP %u.%u.%u.%u is: %u.%u.%u.%u/%u\n\n", (target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF, (routes[best_match_index].ip_prefix >> 24) & 0xFF, (routes[best_match_index].ip_prefix >> 16) & 0xFF, (routes[best_match_index].ip_prefix >> 8) & 0xFF, routes[best_match_index].ip_prefix & 0xFF, routes[best_match_index].prefix_length); } else { printf("No matching route found for IP %u.%u.%u.%u\n\n", (target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF); } } int main() { RouteEntry routes[] = { {0x0A000000, 8, 0}, // 10.0.0.0/8 {0x0A010000, 16, 0}, // 10.1.0.0/16 {0x0A010100, 24, 0}, // 10.1.1.0/24 {0xC0A80000, 16, 0} // 192.168.0.0/16 }; int num_routes = sizeof(routes) / sizeof(routes[0]); uint32_t ips[] = { 0x0A010137, // 10.1.1.55 0x0A0000FF, // 10.0.0.255 0xC0A80001 // 192.168.0.1 }; int num_ips = sizeof(ips) / sizeof(ips[0]); for (int i = 0; i < num_ips; i++) { find_best_match(ips[i], routes, num_routes); } return 0; }

Rewrite Count Leading Zeros

I rewrite the count leading zero function in byte shift, and the code is shown below.

uint16_t count_leading_zeros(uint32_t x) {
    if (x == 0) return 32;
    int n = 1;
    if ((x >> 16) == 0) { n += 16; x <<= 16; }
    if ((x >> 24) == 0) { n += 8; x <<= 8; }
    if ((x >> 28) == 0) { n += 4; x <<= 4; }
    if ((x >> 30) == 0) { n += 2; x <<= 2; }
    n = n - (x >> 31);
    return n;
}

After we excute the code in C compiler, we can obtain the same result.

Target IP: 10.1.1.55
CLZ values for each route:
Route 10.0.0.0/8: CLZ = 15
Route 10.1.0.0/16: CLZ = 23
Route 10.1.1.0/24: CLZ = 26
Best matching route for IP 10.1.1.55 is: 10.1.1.0/24

Target IP: 10.0.0.255
CLZ values for each route:
Route 10.0.0.0/8: CLZ = 24
Best matching route for IP 10.0.0.255 is: 10.0.0.0/8

Target IP: 192.168.0.1
CLZ values for each route:
Route 192.168.0.0/16: CLZ = 31
Best matching route for IP 192.168.0.1 is: 192.168.0.0/16

Assembly code implement in Ripes

I also rewrite the assembly code in byte shift version, and the code is shown below.

Assembly code
.data
routes:
    .word 0x0A000000, 8, 0
    .word 0x0A010000, 16, 0
    .word 0x0A010100, 24, 0
    .word 0xC0A80000, 16, 0

ips:
    .word 0x0A010137
    .word 0x0A0000FF
    .word 0xC0A80001

num_routes: .word 4
num_ips: .word 3
newline: .string "\n"
best_match_ip: .word 0
best_match_prefix: .word 0
dot: .string "."
slash: .string "/"
.text
.globl main

main:
    # Load base addresses of routes, ips, num_routes, and num_ips into registers
    la a1, routes
    la t1, ips
    la t0, num_routes
    lw a2, 0(t0)
    la t0, num_ips
    lw a3, 0(t0)
    # Loop through each IP and find the best match
ip_loop:
    beqz a3, exit        # If no more IPs, exit the loop
    lw a0, 0(t1)         # ips Load the current IP into a0
    j find_best_match # Call the find_best_match function
exit:
    li a7, 10
    ecall
count_leading_zeros:
    addi a4, zero, 1
    srli s2, t2, 16
    bne s2, zero, bs24
    addi a4, a4, 16
    slli t2, t2, 16
bs24:
    srli s2, t2, 24
    bne s2, zero, bs28
    addi a4, a4, 8
    slli t2, t2, 8
bs28:
    srli s2, t2, 28
    bne s2, zero, bs30
    addi a4, a4, 4
    slli t2, t2, 4
bs30:
    srli s2, t2, 30
    bne s2, zero, bs31
    addi a4, a4, 2
    slli t2, t2, 2
bs31:
    srli t2, t2, 31
    sub a4, a4, t2
    ret
    
is_prefix_match:
    # a0: target_ip
    # a1: prefix
    # a4: prefix_length
    # Calculate the mask based on prefix_length
    li   t2, 32          # Load 32 into t2
    sub  t2, t2, a4      # Subtract prefix_length from 32
    li   t3, 1           # Load 1 into t3
    sll  t3, t3, t2      # Shift left 1 by the result of the subtraction
    addi t3, t3, -1      # Subtract 1 from the result to get a mask with leading zeros
    not  t2, t3          # Bitwise NOT to get the mask

    # Apply the mask to target_ip
    and  t2, a0, t2      # AND operation between target_ip and mask
    # Compare the result with prefix
    li   a5, 0           # Set return value to 0 (no match)
    beq  t2, t0, match   # If they are equal, it's a match
    ret

match:
    li   a5, 1           # Set return value to 1 (match)
    ret


find_best_match:
    # a0: target_ip
    # a1: base address of routes
    # a2: num_routes

    # Initialize best_match_index, max_leading_zeros, and route_index
    li t4, -1            # t4 will hold best_match_index
    li t5, 0             # t5 will hold max_leading_zeros
    li t6, 0             # t6 will hold route_index (number of routes traversed)
    li s3, 0             # s3 will hold best_match_ip
    li s4, 0             # s4 will hold best_match_prefix

loop_routes:
    beqz a2, end_loop    # If no more routes, end the loop
    # Load prefix and prefix_length from routes
    lw t0, 0(a1)         # t0 holds ip_prefix
    lbu a4, 4(a1)        # a3 holds prefix_length
    # Check if prefix matches
    jal ra, is_prefix_match
    beqz a5, skip_route  # If no match, skip to next route

    # Calculate leading zeros
    xor t2, a0, t0       # XOR target_ip and ip_prefix 
    jal ra, count_leading_zeros

    # Check if current route has more leading zeros than previous best
    blt t5, a4, update_best

    # If you reach here, it means the route matched but was not the best match
    # So, you can skip to the next route
    j skip_route
    
skip_route:
    addi a1, a1, 12      # Move to the next route entry
    addi a2, a2, -1      # Decrement num_routes
    addi t6, t6, 1       # Increment route_index
    j loop_routes

update_best:
    mv t5, a4            # Update max_leading_zeros
    mv t4, t6            # Update best_match_index with current route_index
    lw s3, 0(a1)         # Save the best match IP from routes
    lbu s4, 4(a1)        # Save the best match prefix from routes
    j skip_route

print_ip:
    # s3 contains the IP to be printed
    # Extract and print the first octet
    srli s5, s3, 24
    andi s5, s5, 0xFF
    mv a0, s5
    li a7, 1
    ecall
    # Print a dot
    la a0, dot
    li a7, 4
    ecall
    # Extract and print the second octet
    srli s5, s3, 16
    andi s5, s5, 0xFF
    mv a0, s5
    li a7, 1
    ecall
    la a0, dot           # Print a dot
    li a7, 4
    ecall
    srli s5, s3, 8       # Extract and print the third octet
    andi s5, s5, 0xFF
    mv a0, s5
    li a7, 1
    ecall
    # Print a dot
    la a0, dot
    li a7, 4
    ecall
    andi s5, s3, 0xFF   # Extract and print the fourth octet
    mv a0, s5
    li a7, 1
    ecall
    ret
end_loop:
    mv a0, s3           # Print the best match IP
    jal ra, print_ip
    la a0, slash
    li a7, 4
    ecall
    mv a0, s4           # Print the best match prefix
    li a7, 1
    ecall
    la a0, newline
    li a7, 4
    ecall
    li a2, 4
    addi t1, t1, 4      # Move to the next IP
    addi a3, a3, -1     # Decrement the IP count
    la a1, routes
    j ip_loop

Comparing original code and byte shift version in Ripes

Original Byte shift
Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Optimized by riscv-none-elf-gcc

Makefile

Before compilering the C code, we have to modify Makefile for our file first. I refered to the Makefile in chacha20 folder and rewrite it. The code shown below is my own Makefile. The Makefile tells make insruction how to compile and link a program.

.PHONY: clean

include ../../../mk/toolchain.mk

CFLAGS = -march=rv32i -mabi=ilp32 -Os

OBJS = ipcheck.o
BIN = ipcheck.elf

%.o: %.s
	$(CROSS_COMPILE)gcc $(CFLAGS) -c -o $@ $<

%.o: %.c
	$(CROSS_COMPILE)gcc $(CFLAGS) -c -o $@ $<

all: $(BIN)

$(BIN): $(OBJS)
	 $(CROSS_COMPILE)gcc -o $@ $<

clean:
	$(RM) $(BIN) $(OBJS)

After we excute make command, it will compiler C file (or Assembly file) to object file first, and then build the ELF file. If the compilation is sucessed, we wil get the following response.

riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -Os -c -o ipcheck.o ipcheck.s
riscv-none-elf-gc -o ipcheck.elf ipcheck.o

Using GNU Toolchain

Run ipcheck.elf using the command line below:

$ ../../build/rv32emu ipcheck.elf

Then we can check the ELF file by following three instructions.

  1. objdump

-d : Display the assembler mnemonics for the machine instructions

$ riscv-none-elf-objdump -d ipcheck.elf

We will get the machine instructions below.

Disassembly code
ipcheck.elf:     file format elf32-littleriscv


Disassembly of section .text:

00010094 <exit>:
   10094:	1141                	add	sp,sp,-16
   10096:	4581                	li	a1,0
   10098:	c422                	sw	s0,8(sp)
   1009a:	c606                	sw	ra,12(sp)
   1009c:	842a                	mv	s0,a0
   1009e:	2bf000ef          	jal	10b5c <__call_exitprocs>
   100a2:	da01a783          	lw	a5,-608(gp) # 125b0 <__stdio_exit_handler>
   100a6:	c391                	beqz	a5,100aa <exit+0x16>
   100a8:	9782                	jalr	a5
   100aa:	8522                	mv	a0,s0
   100ac:	023010ef          	jal	118ce <_exit>

000100b0 <register_fini>:
   100b0:	00000793          	li	a5,0
   100b4:	c791                	beqz	a5,100c0 <register_fini+0x10>
   100b6:	00001517          	auipc	a0,0x1
   100ba:	91c50513          	add	a0,a0,-1764 # 109d2 <__libc_fini_array>
   100be:	a4ed                	j	103a8 <atexit>
   100c0:	8082                	ret

000100c2 <_start>:
   100c2:	00002197          	auipc	gp,0x2
   100c6:	74e18193          	add	gp,gp,1870 # 12810 <__global_pointer$>
   100ca:	d9c18513          	add	a0,gp,-612 # 125ac <completed.1>
   100ce:	0e418613          	add	a2,gp,228 # 128f4 <__BSS_END__>
   100d2:	8e09                	sub	a2,a2,a0
   100d4:	4581                	li	a1,0
   100d6:	14d000ef          	jal	10a22 <memset>
   100da:	00001517          	auipc	a0,0x1
   100de:	8f850513          	add	a0,a0,-1800 # 109d2 <__libc_fini_array>
   100e2:	24d9                	jal	103a8 <atexit>
   100e4:	085000ef          	jal	10968 <__libc_init_array>
   100e8:	4502                	lw	a0,0(sp)
   100ea:	004c                	add	a1,sp,4
   100ec:	4601                	li	a2,0
   100ee:	2899                	jal	10144 <main>
   100f0:	b755                	j	10094 <exit>

000100f2 <__do_global_dtors_aux>:
   100f2:	1141                	add	sp,sp,-16
   100f4:	c422                	sw	s0,8(sp)
   100f6:	d9c18413          	add	s0,gp,-612 # 125ac <completed.1>
   100fa:	00044783          	lbu	a5,0(s0)
   100fe:	c606                	sw	ra,12(sp)
   10100:	ef99                	bnez	a5,1011e <__do_global_dtors_aux+0x2c>
   10102:	00000793          	li	a5,0
   10106:	cb89                	beqz	a5,10118 <__do_global_dtors_aux+0x26>
   10108:	00002517          	auipc	a0,0x2
   1010c:	ef850513          	add	a0,a0,-264 # 12000 <__EH_FRAME_BEGIN__>
   10110:	00000097          	auipc	ra,0x0
   10114:	000000e7          	jalr	zero # 0 <str_dot_len-0x1>
   10118:	4785                	li	a5,1
   1011a:	00f40023          	sb	a5,0(s0)
   1011e:	40b2                	lw	ra,12(sp)
   10120:	4422                	lw	s0,8(sp)
   10122:	0141                	add	sp,sp,16
   10124:	8082                	ret

00010126 <frame_dummy>:
   10126:	00000793          	li	a5,0
   1012a:	cb99                	beqz	a5,10140 <frame_dummy+0x1a>
   1012c:	ddc18593          	add	a1,gp,-548 # 125ec <object.0>
   10130:	00002517          	auipc	a0,0x2
   10134:	ed050513          	add	a0,a0,-304 # 12000 <__EH_FRAME_BEGIN__>
   10138:	00000317          	auipc	t1,0x0
   1013c:	00000067          	jr	zero # 0 <str_dot_len-0x1>
   10140:	8082                	ret
	...
  1. readelf

-h : Display the ELF file header

$ riscv-none-elf-readelf -h ipcheck.elf

We wil get the following output:

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x100c2
  Start of program headers:          52 (bytes into file)
  Start of section headers:          16304 (bytes into file)
  Flags:                             0x1
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         14
  Section header string table index: 13
  1. size

list the section sizes—and the total size—for each of the object or archive files objfile in its argument list.

$ riscv-none-elf-size ipcheck.elf

We wil get the following output:

 text	   data	    bss	    dec	    hex	filename
 6444	   1452	    840	   8736	   2220	ipcheck.elf

Furthermore, we can check our register usage and instruction frequency by following two commands.

We have to use make tool commond in rv32emu folder, then we can use the rv_histogram command.

  1. RV32 Target Instruction Frequency Histogram
$ ../../build/rv_histogram -a ipcheck.elf
  1. RV32 Target Register Usage Histogram
$ ../../build/rv_histogram -r ipcheck.elf

Performance Analysis

We can add the ticks.c into our C code thus we can analyze the cycles.

C code with ticks
#include <inttypes.h> #include <stdint.h> #include <stdio.h> typedef uint64_t ticks; static inline ticks getticks(void) { uint64_t result; uint32_t l, h, h2; asm volatile( "rdcycleh %0\n" "rdcycle %1\n" "rdcycleh %2\n" "sub %0, %0, %2\n" "seqz %0, %0\n" "sub %0, zero, %0\n" "and %1, %1, %0\n" : "=r"(h), "=r"(l), "=r"(h2)); result = (((uint64_t) h) << 32) | ((uint64_t) l); return result; } typedef struct { uint32_t ip_prefix; uint8_t prefix_length; uint16_t leading_zeros; } RouteEntry; uint16_t count_leading_zeros(uint32_t x) { if (x == 0) return 32; int n = 1; if ((x >> 16) == 0) { n += 16; x <<= 16; } if ((x >> 24) == 0) { n += 8; x <<= 8; } if ((x >> 28) == 0) { n += 4; x <<= 4; } if ((x >> 30) == 0) { n += 2; x <<= 2; } n = n - (x >> 31); return n; } int is_prefix_match(uint32_t target_ip, uint32_t prefix, uint8_t prefix_length) { uint32_t mask = ~((1 << (32 - prefix_length)) - 1); return (target_ip & mask) == prefix; } void find_best_match(uint32_t target_ip, RouteEntry* routes, int num_routes) { int best_match_index = -1; uint16_t max_leading_zeros = 0; printf("Target IP: %u.%u.%u.%u\n", (target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF); printf("CLZ values for each route:\n"); for (int i = 0; i < num_routes; i++) { if (is_prefix_match(target_ip, routes[i].ip_prefix, routes[i].prefix_length)) { uint32_t xor_result = target_ip ^ routes[i].ip_prefix; routes[i].leading_zeros = count_leading_zeros(xor_result); printf("Route %u.%u.%u.%u/%u: CLZ = %u\n", (routes[i].ip_prefix >> 24) & 0xFF, (routes[i].ip_prefix >> 16) & 0xFF, (routes[i].ip_prefix >> 8) & 0xFF, routes[i].ip_prefix & 0xFF, routes[i].prefix_length, routes[i].leading_zeros); if (routes[i].leading_zeros > max_leading_zeros) { max_leading_zeros = routes[i].leading_zeros; best_match_index = i; } } } if (best_match_index != -1) { printf("Best matching route for IP %u.%u.%u.%u is: %u.%u.%u.%u/%u\n\n", (target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF, (routes[best_match_index].ip_prefix >> 24) & 0xFF, (routes[best_match_index].ip_prefix >> 16) & 0xFF, (routes[best_match_index].ip_prefix >> 8) & 0xFF, routes[best_match_index].ip_prefix & 0xFF, routes[best_match_index].prefix_length); } else { printf("No matching route found for IP %u.%u.%u.%u\n\n", (target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF); } } int main() { RouteEntry routes[] = { {0x0A000000, 8, 0}, // 10.0.0.0/8 {0x0A010000, 16, 0}, // 10.1.0.0/16 {0x0A010100, 24, 0}, // 10.1.1.0/24 {0xC0A80000, 16, 0} // 192.168.0.0/16 }; int num_routes = sizeof(routes) / sizeof(routes[0]); uint32_t ips[] = { 0x0A010137, // 10.1.1.55 0x0A0000FF, // 10.0.0.255 0xC0A80001 // 192.168.0.1 }; int num_ips = sizeof(ips) / sizeof(ips[0]); ticks t0 = getticks(); for (int i = 0; i < num_ips; i++) { find_best_match(ips[i], routes, num_routes); } ticks t1 = getticks(); printf("elapsed cycle: %" PRIu64 "\n", t1 - t0); return 0; }

-O0 Optimized Result

ELF file header

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x100c2
  Start of program headers:          52 (bytes into file)
  Start of section headers:          69436 (bytes into file)
  Flags:                             0x1, RVC, soft-float ABI
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14

ELF file size

 text	   data	    bss	    dec	    hex	filename
52966	   1876	   1528	  56370	   dc32	ipcheck.elf

Elapsed cycle : 46375

Instruction Frequency Histogram :

Target Register Usage Histogram :

-O1 Optimized Result

ELF file header

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x100c2
  Start of program headers:          52 (bytes into file)
  Start of section headers:          69436 (bytes into file)
  Flags:                             0x1, RVC, soft-float ABI
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14

ELF file size

 text	   data	    bss	    dec	    hex	filename
52274	   1876	   1528	  55678	   d97e	ipcheck.elf

Elapsed cycle : 45408

Instruction Frequency Histogram :

Target Register Usage Histogram :

-O2 Optimized Result

ELF file header

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x100b2
  Start of program headers:          52 (bytes into file)
  Start of section headers:          69452 (bytes into file)
  Flags:                             0x1, RVC, soft-float ABI
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14

ELF file size

 text	   data	    bss	    dec	    hex	filename
52306	   1876	   1528	  55710	   d99e	ipcheck.elf

Elapsed cycle : 45403

Instruction Frequency Histogram :

Target Register Usage Histogram :

-O3 Optimized Result

ELF file header

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x1061a
  Start of program headers:          52 (bytes into file)
  Start of section headers:          69860 (bytes into file)
  Flags:                             0x1, RVC, soft-float ABI
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14

ELF file size

 text	   data	    bss	    dec	    hex	filename
53528	   1876	   1528	  56932	   de64	ipcheck.elf

Elapsed cycle : 45174

Instruction Frequency Histogram :

Target Register Usage Histogram :

-Ofast Optimized Result

ELF file header

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x1061a
  Start of program headers:          52 (bytes into file)
  Start of section headers:          69860 (bytes into file)
  Flags:                             0x1, RVC, soft-float ABI
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14

ELF file size

 text	   data	    bss	    dec	    hex	filename
53528	   1876	   1528	  56932	   de64	ipcheck.elf

Elapsed cycle : 45174

Instruction Frequency Histogram :

Target Register Usage Histogram :

-Os Optimized Result

ELF file header

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x1017e
  Start of program headers:          52 (bytes into file)
  Start of section headers:          69452 (bytes into file)
  Flags:                             0x1, RVC, soft-float ABI
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14

ELF file size

 text	   data	    bss	    dec	    hex	filename
52248	   1876	   1528	  55652	   d964	ipcheck.elf

Elapsed cycle : 45438

Instruction Frequency Histogram :

Target Register Usage Histogram :

Handwriting Assembly code

Rv32emu and Ripes may not work together, therefore we have to rewrite the assembly code before we compiler it. We can check the docs/syscall.md and src/syscall.c in advance.

For example:
We have to change exit system call from

    li a7, 10                           # exit program
    ecall

into

    li a0, 0
    li a7, 93                           # exit program
    ecall

The code below is my rewrited assembly code :

Rewrite assembly code
.org 0
.global main

# New syscall 
.set SYSEXIT, 93
.set SYSWRITE, 64
.data
routes:
    .word 0x0A000000, 8, 0
    .word 0x0A010000, 16, 0
    .word 0x0A010100, 24, 0
    .word 0xC0A80000, 16, 0

ips:
    .word 0x0A010137
    .word 0x0A0000FF
    .word 0xC0A80001

num_routes: 
    .word 4
num_ips:  .word 3

nextline:    
    .ascii  "\n"
    .set str_next_len, .-nextline

best_match_ip: 
    .word 0
best_match_prefix: 
    .word 0
dot:       
    .ascii  "."
    .set str_dot_len, .-dot
slash:     
    .ascii  "/"
    .set str_slash_len, .-slash

.text
main:
    # Load base addresses of routes, ips, num_routes, and num_ips into registers
    la a1, routes
    la t1, ips
    la t0, num_routes
    lw a2, 0(t0)
    la t0, num_ips
    lw a3, 0(t0)
    # Loop through each IP and find the best match
ip_loop:
    beqz a3, exit       		# If no more IPs, exit the loop
    lw a0, 0(t1)         		# ips Load the current IP into a0
    j find_best_match 			# Call the find_best_match function
exit:
    li a7, SYSEXIT
    addi a0, x0, 0
    ecall
count_leading_zeros:
    addi a4, zero, 1
    srli s2, t2, 16
    bne s2, zero, bs24
    addi a4, a4, 16
    slli t2, t2, 16
bs24:
    srli s2, t2, 24
    bne s2, zero, bs28
    addi a4, a4, 8
    slli t2, t2, 8
bs28:
    srli s2, t2, 28
    bne s2, zero, bs30
    addi a4, a4, 4
    slli t2, t2, 4
bs30:
    srli s2, t2, 30
    bne s2, zero, bs31
    addi a4, a4, 2
    slli t2, t2, 2
bs31:
    srli t2, t2, 31
    sub a4, a4, t2
    ret
    
is_prefix_match:
    # a0: target_ip
    # a1: prefix
    # a4: prefix_length
    # Calculate the mask based on prefix_length
    li   t2, 32          # Load 32 into t2
    sub  t2, t2, a4      # Subtract prefix_length from 32
    li   t3, 1           # Load 1 into t3
    sll  t3, t3, t2      # Shift left 1 by the result of the subtraction
    addi t3, t3, -1      # Subtract 1 from the result to get a mask with leading zeros
    not  t2, t3          # Bitwise NOT to get the mask

    # Apply the mask to target_ip
    and  t2, a0, t2      # AND operation between target_ip and mask
    # Compare the result with prefix
    li   a5, 0           # Set return value to 0 (no match)
    beq  t2, t0, match   # If they are equal, it's a match
    ret

match:
    li   a5, 1           # Set return value to 1 (match)
    ret


find_best_match:
    # a0: target_ip
    # a1: base address of routes
    # a2: num_routes

    # Initialize best_match_index, max_leading_zeros, and route_index
    li t4, -1            # t4 will hold best_match_index
    li t5, 0             # t5 will hold max_leading_zeros
    li t6, 0             # t6 will hold route_index (number of routes traversed)
    li s3, 0             # s3 will hold best_match_ip
    li s4, 0             # s4 will hold best_match_prefix

loop_routes:
    beqz a2, end_loop    # If no more routes, end the loop
    # Load prefix and prefix_length from routes
    lw t0, 0(a1)         # t0 holds ip_prefix
    lbu a4, 4(a1)        # a3 holds prefix_length
    # Check if prefix matches
    jal ra, is_prefix_match
    beqz a5, skip_route  # If no match, skip to next route

    # Calculate leading zeros
    xor t2, a0, t0       # XOR target_ip and ip_prefix 
    jal ra, count_leading_zeros

    # Check if current route has more leading zeros than previous best
    blt t5, a4, update_best

    # If you reach here, it means the route matched but was not the best match
    # So, you can skip to the next route
    j skip_route
    
skip_route:
    addi a1, a1, 12      # Move to the next route entry
    addi a2, a2, -1      # Decrement num_routes
    addi t6, t6, 1       # Increment route_index
    j loop_routes

update_best:
    mv t5, a4            # Update max_leading_zeros
    mv t4, t6            # Update best_match_index with current route_index
    lw s3, 0(a1)         # Save the best match IP from routes
    lbu s4, 4(a1)        # Save the best match prefix from routes
    j skip_route

print_ip:
    # s3 contains the IP to be printed
    # Extract and print the first octet
    srli s5, s3, 24
    andi s5, s5, 0xFF
    addi sp, sp, -16
    sw s5, 0(sp)
    sw a1, 4(sp)
    sw a2, 8(sp)
    sw ra, 12(sp)
    addi a1, sp, 0    
    li a7, SYSWRITE
    li a0, 1
    li a2, 4
    ecall   
    # Print a dot
    li a7, SYSWRITE
    li a0, 1
    la a1, dot
    li a2, str_dot_len
    ecall
    # Extract and print the second octet
    srli s5, s3, 16
    andi s5, s5, 0xFF
    sw s5, 0(sp)
    addi a1, sp, 0    
    li a7, SYSWRITE
    li a0, 1
    li a2, 4
    ecall
    li a7, SYSWRITE
    li a0, 1
    la a1, dot
    li a2, str_dot_len
    ecall
    srli s5, s3, 8       # Extract and print the third octet
    andi s5, s5, 0xFF
    sw s5, 0(sp)
    addi a1, sp, 0    
    li a7, SYSWRITE
    li a0, 1
    li a2, 4
    ecall
    # Print a dot
    li a7, SYSWRITE
    li a0, 1
    la a1, dot
    li a2, str_dot_len
    ecall
    andi s5, s3, 0xFF   # Extract and print the fourth octet
    sw s5, 0(sp)
    addi a1, sp, 0    
    li a7, SYSWRITE
    li a0, 1
    li a2, 4
    ecall
    lw a1, 4(sp)
    lw a2, 8(sp)
    lw ra, 12(sp)
    addi sp, sp, 16
    ret
end_loop:
    mv a0, s3           # Print the best match IP
    jal ra, print_ip
    addi sp, sp, -12
    sw a1, 4(sp)
    sw a2, 8(sp)    
    li a7, SYSWRITE
    li a0, 1
    la a1, slash
    li a2, str_slash_len
    ecall
    sw s4, 0(sp)    
    addi a1, sp, 0    
    li a7, SYSWRITE
    li a0, 1
    li a2, 4
    ecall
    li a7, SYSWRITE
    li a0, 1
    la a1, nextline
    li a2, str_next_len
    ecall
    lw a1, 4(sp)
    lw a2, 8(sp)
    addi sp, sp, 12 
    li a2, 4
    addi t1, t1, 4      # Move to the next IP
    addi a3, a3, -1     # Decrement the IP count
    la a1, routes
    j ip_loop

But there are some trouble with displaying numbers when I excute the program. The numbers can't display correctly.

.../

.../
 . ../
inferior exit code 0

After googling, this article provide some methods to solve this problem. We can use printf commond to display integer. Thus, I rewrite the assembly code into printf version again.

Printf version
.org 0
.global main

# New syscall 
.set SYSEXIT, 93
.set SYSWRITE, 64
.data
routes:
    .word 0x0A000000, 8, 0
    .word 0x0A010000, 16, 0
    .word 0x0A010100, 24, 0
    .word 0xC0A80000, 16, 0

ips:
    .word 0x0A010137
    .word 0x0A0000FF
    .word 0xC0A80001
iformat:
    .string "%d"
num_routes: 
    .word 4
num_ips:  .word 3

nextline:    
    .ascii  "\n"
    .set str_next_len, .-nextline

best_match_ip: 
    .word 0
best_match_prefix: 
    .word 0
dot:       
    .ascii  "."
    .set str_dot_len, .-dot
slash:     
    .ascii  "/"
    .set str_slash_len, .-slash

.text
main:
    # Load base addresses of routes, ips, num_routes, and num_ips into registers
    la a1, routes
    la t1, ips
    la t0, num_routes
    lw a2, 0(t0)
    la t0, num_ips
    lw a3, 0(t0)
    # Loop through each IP and find the best match
ip_loop:
    beqz a3, exit       		# If no more IPs, exit the loop
    lw a0, 0(t1)         		# ips Load the current IP into a0
    j find_best_match 			# Call the find_best_match function
exit:
    li a7, SYSEXIT
    addi a0, x0, 0
    ecall
count_leading_zeros:
    addi a4, zero, 1
    srli s2, t2, 16
    bne s2, zero, bs24
    addi a4, a4, 16
    slli t2, t2, 16
bs24:
    srli s2, t2, 24
    bne s2, zero, bs28
    addi a4, a4, 8
    slli t2, t2, 8
bs28:
    srli s2, t2, 28
    bne s2, zero, bs30
    addi a4, a4, 4
    slli t2, t2, 4
bs30:
    srli s2, t2, 30
    bne s2, zero, bs31
    addi a4, a4, 2
    slli t2, t2, 2
bs31:
    srli t2, t2, 31
    sub a4, a4, t2
    ret
    
is_prefix_match:
    # a0: target_ip
    # a1: prefix
    # a4: prefix_length
    # Calculate the mask based on prefix_length
    li   t2, 32          # Load 32 into t2
    sub  t2, t2, a4      # Subtract prefix_length from 32
    li   t3, 1           # Load 1 into t3
    sll  t3, t3, t2      # Shift left 1 by the result of the subtraction
    addi t3, t3, -1      # Subtract 1 from the result to get a mask with leading zeros
    not  t2, t3          # Bitwise NOT to get the mask

    # Apply the mask to target_ip
    and  t2, a0, t2      # AND operation between target_ip and mask
    # Compare the result with prefix
    li   a5, 0           # Set return value to 0 (no match)
    beq  t2, t0, match   # If they are equal, it's a match
    ret

match:
    li   a5, 1           # Set return value to 1 (match)
    ret


find_best_match:
    # a0: target_ip
    # a1: base address of routes
    # a2: num_routes

    # Initialize best_match_index, max_leading_zeros, and route_index
    li t4, -1            # t4 will hold best_match_index
    li t5, 0             # t5 will hold max_leading_zeros
    li t6, 0             # t6 will hold route_index (number of routes traversed)
    li s3, 0             # s3 will hold best_match_ip
    li s4, 0             # s4 will hold best_match_prefix

loop_routes:
    beqz a2, end_loop    # If no more routes, end the loop
    # Load prefix and prefix_length from routes
    lw t0, 0(a1)         # t0 holds ip_prefix
    lbu a4, 4(a1)        # a3 holds prefix_length
    # Check if prefix matches
    jal ra, is_prefix_match
    beqz a5, skip_route  # If no match, skip to next route

    # Calculate leading zeros
    xor t2, a0, t0       # XOR target_ip and ip_prefix 
    jal ra, count_leading_zeros

    # Check if current route has more leading zeros than previous best
    blt t5, a4, update_best

    # If you reach here, it means the route matched but was not the best match
    # So, you can skip to the next route
    j skip_route
    
skip_route:
    addi a1, a1, 12      # Move to the next route entry
    addi a2, a2, -1      # Decrement num_routes
    addi t6, t6, 1       # Increment route_index
    j loop_routes

update_best:
    mv t5, a4            # Update max_leading_zeros
    mv t4, t6            # Update best_match_index with current route_index
    lw s3, 0(a1)         # Save the best match IP from routes
    lbu s4, 4(a1)        # Save the best match prefix from routes
    j skip_route

print_ip:
    # s3 contains the IP to be printed
    # Extract and print the first octet
    srli s5, s3, 24
    andi s5, s5, 0xFF
    addi sp, sp, -16
    sw s5, 0(sp)
    sw a1, 4(sp)
    sw a2, 8(sp)
    sw ra, 12(sp)
    la a0, iformat
    lw a1, 0(sp)
    call printf    
    # Print a dot
    li a7, SYSWRITE
    li a0, 1
    la a1, dot
    li a2, str_dot_len
    ecall
    # Extract and print the second octet
    srli s5, s3, 16
    andi s5, s5, 0xFF
    sw s5, 0(sp)
    la a0, iformat
    lw a1, 0(sp)
    call printf   
    li a7, SYSWRITE
    li a0, 1
    la a1, dot
    li a2, str_dot_len
    ecall
    srli s5, s3, 8       # Extract and print the third octet
    andi s5, s5, 0xFF
    sw s5, 0(sp)
    la a0, iformat
    lw a1, 0(sp)
    call printf   
    # Print a dot
    li a7, SYSWRITE
    li a0, 1
    la a1, dot
    li a2, str_dot_len
    ecall
    andi s5, s3, 0xFF   # Extract and print the fourth octet
    sw s5, 0(sp)
    la a0, iformat
    lw a1, 0(sp)
    call printf   
    lw a1, 4(sp)
    lw a2, 8(sp)
    lw ra, 12(sp)
    addi sp, sp, 16
    ret
end_loop:
    mv a0, s3           # Print the best match IP
    jal ra, print_ip
    addi sp, sp, -16
    sw a1, 4(sp)
    sw a2, 8(sp)
    sw ra, 12(sp)    
    li a7, SYSWRITE
    li a0, 1
    la a1, slash
    li a2, str_slash_len
    ecall
    sw s4, 0(sp)    
    la a0, iformat
    lw a1, 0(sp)
    call printf   
    li a7, SYSWRITE
    li a0, 1
    la a1, nextline
    li a2, str_next_len
    ecall
    lw a1, 4(sp)
    lw a2, 8(sp)
    lw ra, 12(sp)
    addi sp, sp, 16 
    li a2, 4
    addi t1, t1, 4      # Move to the next IP
    addi a3, a3, -1     # Decrement the IP count
    la a1, routes
    j ip_loop

However, it still can't print the result correctly. Rv32emu seems not support printf commond. Therefore, I decide to use the first assembly code to generate ELF file and analyze it.

ELF file header

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x100c2
  Start of program headers:          52 (bytes into file)
  Start of section headers:          16304 (bytes into file)
  Flags:                             0x1, RVC, soft-float ABI
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         14
  Section header string table index: 13

ELF file size

 text	   data	    bss	    dec	    hex	filename
 6444	   1452	    840	   8736	   2220	ipcheck.elf

cycle count : 666

Instruction Frequency Histogram :

Target Register Usage Histogram :

Analyze

Comparing handwriting assemble with assembly code generated by riscv-none-elf-gcc.

Handwriting O0 O1 O2 O3 Os Ofast
Size 6444 52966 52274 52306 53528 52248 53528

We can notice there is a tremendous difference in size between handwriting assembly code and generated code. Furthermore, Os optimized has smallest among the optimization levels.

Specifies the degree to which the generated code is optimized for speed and binary size.
The different among the optimization level are shown below.

  1. None[-O0]
    Do not optimize. With this setting, the compiler's goal is to reduce the cost of compilation and to make debugging produce the expected results.

  2. Fast[-O1]
    Optimizing compilation takes somewhat more time, and a lot more memory for a large function. With this setting, the compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time

  3. Faster[-O2]
    The compiler performs nearly all supported optimizations that do not involve a space-speed tradeoff. With this setting, the compiler does not perform loop unrolling or function inlining, or register renaming. As compared to the 'Fast' setting, this setting increases both compilation time and the performance of the generated code.

  4. Fastest[-O3]
    Turns on all optimizations specified by the 'Faster' setting and also turns on function inlining and register renaming options. This setting may result in a larger binary.

  5. Fastest, Smallest[-Os]
    Optimize for size. This setting enables all 'Faster' optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.

  6. Fastest, Aggressive Optimizations[-Ofast]
    This setting enables 'Fastest' but also enables aggressive optimizations that may break strict standards compliance but should work well on well-behaved code.

We can also compare the cycles among different optimization level

Handwriting O0 O1 O2 O3 Os Ofast
cycles 666 46375 45408 45403 45174 45438 45174

We can notice that handwriting assembly code still use less cycles and Ofast optimized has smallest cycles among the optimization levels. However, as the optimization level decrease, it will use less cycles to excute the program. But the cycles of Os is worse than others. But there
are only slight difference among the six optimization level
Furthermore, I found whichever optimization level I chose, the text size and cycles of handwriting assembly code have always been the same.

Other discovery

I am interested in which C instruction spend the most cycles. After trying, I found printf spend the majority of the cycles. Thus, I decide to remove all printf in my C code and count the cycles. Without printf the cycles reduce obviously, and the results are shown below.

O0 O1 O2 O3 Os Ofast
Count cycles 1135 356 279 7 429 7

Reference