contributed by < ChengChiTing
>
RISC-V
, jserv
When I follow the instructions in Lab2: RISC-V RV32I[MACF] emulator with ELF support,some error happened during "make" step. The Terminal response the message
/bin/sh: 1: cc: not found
/bin/sh: 1: cc: not found
check the file build/.config for configured items.
/bin/sh: 1: cc: not found
/bin/sh: 1: cc: not found
cc build/map.o
make: cc: Command not found
make *** [Makefile:142: build/map.o] Error 127
After googling, I found we can use this command to solve the problem
sudo apt-get install build-essential
After make
and make check
successed, rv32emu is installed on Ubuntu.
note! Everytime reset the Terminal, we have to execute the following command again :
$ cd $HOME
$ source riscv-none-elf-gcc/setenv
I choose the subject from 范紘維 , because I also use clz in my first homework and I think I can use some method to optimize this solution ( e.g. use iterative or byte shift ) . Furthermore, I am interested in how to check the IP routing with clz.
The C code below is the original code contributed by <fan1071221>
#include <stdio.h>
#include <stdint.h>
typedef struct {
uint32_t ip_prefix;
uint8_t prefix_length;
uint16_t leading_zeros;
} RouteEntry;
uint16_t count_leading_zeros(uint32_t x) {
if (x == 0) return 32;
int n = 1;
if ((x >> 16) == 0) { n += 16; x <<= 16; }
if ((x >> 24) == 0) { n += 8; x <<= 8; }
if ((x >> 28) == 0) { n += 4; x <<= 4; }
if ((x >> 30) == 0) { n += 2; x <<= 2; }
n = n - (x >> 31);
return n;
}
int is_prefix_match(uint32_t target_ip, uint32_t prefix, uint8_t prefix_length) {
uint32_t mask = ~((1 << (32 - prefix_length)) - 1);
return (target_ip & mask) == prefix;
}
void find_best_match(uint32_t target_ip, RouteEntry* routes, int num_routes) {
int best_match_index = -1;
uint16_t max_leading_zeros = 0;
printf("Target IP: %u.%u.%u.%u\n",
(target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF);
printf("CLZ values for each route:\n");
for (int i = 0; i < num_routes; i++) {
if (is_prefix_match(target_ip, routes[i].ip_prefix, routes[i].prefix_length)) {
uint32_t xor_result = target_ip ^ routes[i].ip_prefix;
routes[i].leading_zeros = count_leading_zeros(xor_result);
printf("Route %u.%u.%u.%u/%u: CLZ = %u\n",
(routes[i].ip_prefix >> 24) & 0xFF, (routes[i].ip_prefix >> 16) & 0xFF,
(routes[i].ip_prefix >> 8) & 0xFF, routes[i].ip_prefix & 0xFF,
routes[i].prefix_length, routes[i].leading_zeros);
if (routes[i].leading_zeros > max_leading_zeros) {
max_leading_zeros = routes[i].leading_zeros;
best_match_index = i;
}
}
}
if (best_match_index != -1) {
printf("Best matching route for IP %u.%u.%u.%u is: %u.%u.%u.%u/%u\n\n",
(target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF,
(routes[best_match_index].ip_prefix >> 24) & 0xFF, (routes[best_match_index].ip_prefix >> 16) & 0xFF,
(routes[best_match_index].ip_prefix >> 8) & 0xFF, routes[best_match_index].ip_prefix & 0xFF,
routes[best_match_index].prefix_length);
} else {
printf("No matching route found for IP %u.%u.%u.%u\n\n",
(target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF);
}
}
int main() {
RouteEntry routes[] = {
{0x0A000000, 8, 0}, // 10.0.0.0/8
{0x0A010000, 16, 0}, // 10.1.0.0/16
{0x0A010100, 24, 0}, // 10.1.1.0/24
{0xC0A80000, 16, 0} // 192.168.0.0/16
};
int num_routes = sizeof(routes) / sizeof(routes[0]);
uint32_t ips[] = {
0x0A010137, // 10.1.1.55
0x0A0000FF, // 10.0.0.255
0xC0A80001 // 192.168.0.1
};
int num_ips = sizeof(ips) / sizeof(ips[0]);
for (int i = 0; i < num_ips; i++) {
find_best_match(ips[i], routes, num_routes);
}
return 0;
}
I rewrite the count leading zero function in byte shift, and the code is shown below.
uint16_t count_leading_zeros(uint32_t x) {
if (x == 0) return 32;
int n = 1;
if ((x >> 16) == 0) { n += 16; x <<= 16; }
if ((x >> 24) == 0) { n += 8; x <<= 8; }
if ((x >> 28) == 0) { n += 4; x <<= 4; }
if ((x >> 30) == 0) { n += 2; x <<= 2; }
n = n - (x >> 31);
return n;
}
After we excute the code in C compiler, we can obtain the same result.
Target IP: 10.1.1.55
CLZ values for each route:
Route 10.0.0.0/8: CLZ = 15
Route 10.1.0.0/16: CLZ = 23
Route 10.1.1.0/24: CLZ = 26
Best matching route for IP 10.1.1.55 is: 10.1.1.0/24
Target IP: 10.0.0.255
CLZ values for each route:
Route 10.0.0.0/8: CLZ = 24
Best matching route for IP 10.0.0.255 is: 10.0.0.0/8
Target IP: 192.168.0.1
CLZ values for each route:
Route 192.168.0.0/16: CLZ = 31
Best matching route for IP 192.168.0.1 is: 192.168.0.0/16
I also rewrite the assembly code in byte shift version, and the code is shown below.
.data
routes:
.word 0x0A000000, 8, 0
.word 0x0A010000, 16, 0
.word 0x0A010100, 24, 0
.word 0xC0A80000, 16, 0
ips:
.word 0x0A010137
.word 0x0A0000FF
.word 0xC0A80001
num_routes: .word 4
num_ips: .word 3
newline: .string "\n"
best_match_ip: .word 0
best_match_prefix: .word 0
dot: .string "."
slash: .string "/"
.text
.globl main
main:
# Load base addresses of routes, ips, num_routes, and num_ips into registers
la a1, routes
la t1, ips
la t0, num_routes
lw a2, 0(t0)
la t0, num_ips
lw a3, 0(t0)
# Loop through each IP and find the best match
ip_loop:
beqz a3, exit # If no more IPs, exit the loop
lw a0, 0(t1) # ips Load the current IP into a0
j find_best_match # Call the find_best_match function
exit:
li a7, 10
ecall
count_leading_zeros:
addi a4, zero, 1
srli s2, t2, 16
bne s2, zero, bs24
addi a4, a4, 16
slli t2, t2, 16
bs24:
srli s2, t2, 24
bne s2, zero, bs28
addi a4, a4, 8
slli t2, t2, 8
bs28:
srli s2, t2, 28
bne s2, zero, bs30
addi a4, a4, 4
slli t2, t2, 4
bs30:
srli s2, t2, 30
bne s2, zero, bs31
addi a4, a4, 2
slli t2, t2, 2
bs31:
srli t2, t2, 31
sub a4, a4, t2
ret
is_prefix_match:
# a0: target_ip
# a1: prefix
# a4: prefix_length
# Calculate the mask based on prefix_length
li t2, 32 # Load 32 into t2
sub t2, t2, a4 # Subtract prefix_length from 32
li t3, 1 # Load 1 into t3
sll t3, t3, t2 # Shift left 1 by the result of the subtraction
addi t3, t3, -1 # Subtract 1 from the result to get a mask with leading zeros
not t2, t3 # Bitwise NOT to get the mask
# Apply the mask to target_ip
and t2, a0, t2 # AND operation between target_ip and mask
# Compare the result with prefix
li a5, 0 # Set return value to 0 (no match)
beq t2, t0, match # If they are equal, it's a match
ret
match:
li a5, 1 # Set return value to 1 (match)
ret
find_best_match:
# a0: target_ip
# a1: base address of routes
# a2: num_routes
# Initialize best_match_index, max_leading_zeros, and route_index
li t4, -1 # t4 will hold best_match_index
li t5, 0 # t5 will hold max_leading_zeros
li t6, 0 # t6 will hold route_index (number of routes traversed)
li s3, 0 # s3 will hold best_match_ip
li s4, 0 # s4 will hold best_match_prefix
loop_routes:
beqz a2, end_loop # If no more routes, end the loop
# Load prefix and prefix_length from routes
lw t0, 0(a1) # t0 holds ip_prefix
lbu a4, 4(a1) # a3 holds prefix_length
# Check if prefix matches
jal ra, is_prefix_match
beqz a5, skip_route # If no match, skip to next route
# Calculate leading zeros
xor t2, a0, t0 # XOR target_ip and ip_prefix
jal ra, count_leading_zeros
# Check if current route has more leading zeros than previous best
blt t5, a4, update_best
# If you reach here, it means the route matched but was not the best match
# So, you can skip to the next route
j skip_route
skip_route:
addi a1, a1, 12 # Move to the next route entry
addi a2, a2, -1 # Decrement num_routes
addi t6, t6, 1 # Increment route_index
j loop_routes
update_best:
mv t5, a4 # Update max_leading_zeros
mv t4, t6 # Update best_match_index with current route_index
lw s3, 0(a1) # Save the best match IP from routes
lbu s4, 4(a1) # Save the best match prefix from routes
j skip_route
print_ip:
# s3 contains the IP to be printed
# Extract and print the first octet
srli s5, s3, 24
andi s5, s5, 0xFF
mv a0, s5
li a7, 1
ecall
# Print a dot
la a0, dot
li a7, 4
ecall
# Extract and print the second octet
srli s5, s3, 16
andi s5, s5, 0xFF
mv a0, s5
li a7, 1
ecall
la a0, dot # Print a dot
li a7, 4
ecall
srli s5, s3, 8 # Extract and print the third octet
andi s5, s5, 0xFF
mv a0, s5
li a7, 1
ecall
# Print a dot
la a0, dot
li a7, 4
ecall
andi s5, s3, 0xFF # Extract and print the fourth octet
mv a0, s5
li a7, 1
ecall
ret
end_loop:
mv a0, s3 # Print the best match IP
jal ra, print_ip
la a0, slash
li a7, 4
ecall
mv a0, s4 # Print the best match prefix
li a7, 1
ecall
la a0, newline
li a7, 4
ecall
li a2, 4
addi t1, t1, 4 # Move to the next IP
addi a3, a3, -1 # Decrement the IP count
la a1, routes
j ip_loop
Original | Byte shift |
---|---|
Image Not Showing
Possible Reasons
|
Image Not Showing
Possible Reasons
|
Before compilering the C code, we have to modify Makefile
for our file first. I refered to the Makefile
in chacha20 folder and rewrite it. The code shown below is my own Makefile
. The Makefile
tells make
insruction how to compile and link a program.
.PHONY: clean
include ../../../mk/toolchain.mk
CFLAGS = -march=rv32i -mabi=ilp32 -Os
OBJS = ipcheck.o
BIN = ipcheck.elf
%.o: %.s
$(CROSS_COMPILE)gcc $(CFLAGS) -c -o $@ $<
%.o: %.c
$(CROSS_COMPILE)gcc $(CFLAGS) -c -o $@ $<
all: $(BIN)
$(BIN): $(OBJS)
$(CROSS_COMPILE)gcc -o $@ $<
clean:
$(RM) $(BIN) $(OBJS)
After we excute make
command, it will compiler C file (or Assembly file) to object file first, and then build the ELF file. If the compilation is sucessed, we wil get the following response.
riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -Os -c -o ipcheck.o ipcheck.s
riscv-none-elf-gc -o ipcheck.elf ipcheck.o
Run ipcheck.elf using the command line below:
$ ../../build/rv32emu ipcheck.elf
Then we can check the ELF file by following three instructions.
-d : Display the assembler mnemonics for the machine instructions
$ riscv-none-elf-objdump -d ipcheck.elf
We will get the machine instructions below.
ipcheck.elf: file format elf32-littleriscv
Disassembly of section .text:
00010094 <exit>:
10094: 1141 add sp,sp,-16
10096: 4581 li a1,0
10098: c422 sw s0,8(sp)
1009a: c606 sw ra,12(sp)
1009c: 842a mv s0,a0
1009e: 2bf000ef jal 10b5c <__call_exitprocs>
100a2: da01a783 lw a5,-608(gp) # 125b0 <__stdio_exit_handler>
100a6: c391 beqz a5,100aa <exit+0x16>
100a8: 9782 jalr a5
100aa: 8522 mv a0,s0
100ac: 023010ef jal 118ce <_exit>
000100b0 <register_fini>:
100b0: 00000793 li a5,0
100b4: c791 beqz a5,100c0 <register_fini+0x10>
100b6: 00001517 auipc a0,0x1
100ba: 91c50513 add a0,a0,-1764 # 109d2 <__libc_fini_array>
100be: a4ed j 103a8 <atexit>
100c0: 8082 ret
000100c2 <_start>:
100c2: 00002197 auipc gp,0x2
100c6: 74e18193 add gp,gp,1870 # 12810 <__global_pointer$>
100ca: d9c18513 add a0,gp,-612 # 125ac <completed.1>
100ce: 0e418613 add a2,gp,228 # 128f4 <__BSS_END__>
100d2: 8e09 sub a2,a2,a0
100d4: 4581 li a1,0
100d6: 14d000ef jal 10a22 <memset>
100da: 00001517 auipc a0,0x1
100de: 8f850513 add a0,a0,-1800 # 109d2 <__libc_fini_array>
100e2: 24d9 jal 103a8 <atexit>
100e4: 085000ef jal 10968 <__libc_init_array>
100e8: 4502 lw a0,0(sp)
100ea: 004c add a1,sp,4
100ec: 4601 li a2,0
100ee: 2899 jal 10144 <main>
100f0: b755 j 10094 <exit>
000100f2 <__do_global_dtors_aux>:
100f2: 1141 add sp,sp,-16
100f4: c422 sw s0,8(sp)
100f6: d9c18413 add s0,gp,-612 # 125ac <completed.1>
100fa: 00044783 lbu a5,0(s0)
100fe: c606 sw ra,12(sp)
10100: ef99 bnez a5,1011e <__do_global_dtors_aux+0x2c>
10102: 00000793 li a5,0
10106: cb89 beqz a5,10118 <__do_global_dtors_aux+0x26>
10108: 00002517 auipc a0,0x2
1010c: ef850513 add a0,a0,-264 # 12000 <__EH_FRAME_BEGIN__>
10110: 00000097 auipc ra,0x0
10114: 000000e7 jalr zero # 0 <str_dot_len-0x1>
10118: 4785 li a5,1
1011a: 00f40023 sb a5,0(s0)
1011e: 40b2 lw ra,12(sp)
10120: 4422 lw s0,8(sp)
10122: 0141 add sp,sp,16
10124: 8082 ret
00010126 <frame_dummy>:
10126: 00000793 li a5,0
1012a: cb99 beqz a5,10140 <frame_dummy+0x1a>
1012c: ddc18593 add a1,gp,-548 # 125ec <object.0>
10130: 00002517 auipc a0,0x2
10134: ed050513 add a0,a0,-304 # 12000 <__EH_FRAME_BEGIN__>
10138: 00000317 auipc t1,0x0
1013c: 00000067 jr zero # 0 <str_dot_len-0x1>
10140: 8082 ret
...
-h : Display the ELF file header
$ riscv-none-elf-readelf -h ipcheck.elf
We wil get the following output:
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x100c2
Start of program headers: 52 (bytes into file)
Start of section headers: 16304 (bytes into file)
Flags: 0x1
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 14
Section header string table index: 13
list the section sizes—and the total size—for each of the object or archive files objfile in its argument list.
$ riscv-none-elf-size ipcheck.elf
We wil get the following output:
text data bss dec hex filename
6444 1452 840 8736 2220 ipcheck.elf
Furthermore, we can check our register usage and instruction frequency by following two commands.
We have to use
make tool
commond inrv32emu
folder, then we can use therv_histogram
command.
$ ../../build/rv_histogram -a ipcheck.elf
$ ../../build/rv_histogram -r ipcheck.elf
We can add the ticks.c into our C code thus we can analyze the cycles.
#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
typedef uint64_t ticks;
static inline ticks getticks(void)
{
uint64_t result;
uint32_t l, h, h2;
asm volatile(
"rdcycleh %0\n"
"rdcycle %1\n"
"rdcycleh %2\n"
"sub %0, %0, %2\n"
"seqz %0, %0\n"
"sub %0, zero, %0\n"
"and %1, %1, %0\n"
: "=r"(h), "=r"(l), "=r"(h2));
result = (((uint64_t) h) << 32) | ((uint64_t) l);
return result;
}
typedef struct {
uint32_t ip_prefix;
uint8_t prefix_length;
uint16_t leading_zeros;
} RouteEntry;
uint16_t count_leading_zeros(uint32_t x) {
if (x == 0) return 32;
int n = 1;
if ((x >> 16) == 0) { n += 16; x <<= 16; }
if ((x >> 24) == 0) { n += 8; x <<= 8; }
if ((x >> 28) == 0) { n += 4; x <<= 4; }
if ((x >> 30) == 0) { n += 2; x <<= 2; }
n = n - (x >> 31);
return n;
}
int is_prefix_match(uint32_t target_ip, uint32_t prefix, uint8_t prefix_length) {
uint32_t mask = ~((1 << (32 - prefix_length)) - 1);
return (target_ip & mask) == prefix;
}
void find_best_match(uint32_t target_ip, RouteEntry* routes, int num_routes) {
int best_match_index = -1;
uint16_t max_leading_zeros = 0;
printf("Target IP: %u.%u.%u.%u\n",
(target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF);
printf("CLZ values for each route:\n");
for (int i = 0; i < num_routes; i++) {
if (is_prefix_match(target_ip, routes[i].ip_prefix, routes[i].prefix_length)) {
uint32_t xor_result = target_ip ^ routes[i].ip_prefix;
routes[i].leading_zeros = count_leading_zeros(xor_result);
printf("Route %u.%u.%u.%u/%u: CLZ = %u\n",
(routes[i].ip_prefix >> 24) & 0xFF, (routes[i].ip_prefix >> 16) & 0xFF,
(routes[i].ip_prefix >> 8) & 0xFF, routes[i].ip_prefix & 0xFF,
routes[i].prefix_length, routes[i].leading_zeros);
if (routes[i].leading_zeros > max_leading_zeros) {
max_leading_zeros = routes[i].leading_zeros;
best_match_index = i;
}
}
}
if (best_match_index != -1) {
printf("Best matching route for IP %u.%u.%u.%u is: %u.%u.%u.%u/%u\n\n",
(target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF,
(routes[best_match_index].ip_prefix >> 24) & 0xFF, (routes[best_match_index].ip_prefix >> 16) & 0xFF,
(routes[best_match_index].ip_prefix >> 8) & 0xFF, routes[best_match_index].ip_prefix & 0xFF,
routes[best_match_index].prefix_length);
} else {
printf("No matching route found for IP %u.%u.%u.%u\n\n",
(target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF);
}
}
int main() {
RouteEntry routes[] = {
{0x0A000000, 8, 0}, // 10.0.0.0/8
{0x0A010000, 16, 0}, // 10.1.0.0/16
{0x0A010100, 24, 0}, // 10.1.1.0/24
{0xC0A80000, 16, 0} // 192.168.0.0/16
};
int num_routes = sizeof(routes) / sizeof(routes[0]);
uint32_t ips[] = {
0x0A010137, // 10.1.1.55
0x0A0000FF, // 10.0.0.255
0xC0A80001 // 192.168.0.1
};
int num_ips = sizeof(ips) / sizeof(ips[0]);
ticks t0 = getticks();
for (int i = 0; i < num_ips; i++) {
find_best_match(ips[i], routes, num_routes);
}
ticks t1 = getticks();
printf("elapsed cycle: %" PRIu64 "\n", t1 - t0);
return 0;
}
ELF file header
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x100c2
Start of program headers: 52 (bytes into file)
Start of section headers: 69436 (bytes into file)
Flags: 0x1, RVC, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
ELF file size
text data bss dec hex filename
52966 1876 1528 56370 dc32 ipcheck.elf
Elapsed cycle : 46375
Instruction Frequency Histogram :
Target Register Usage Histogram :
ELF file header
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x100c2
Start of program headers: 52 (bytes into file)
Start of section headers: 69436 (bytes into file)
Flags: 0x1, RVC, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
ELF file size
text data bss dec hex filename
52274 1876 1528 55678 d97e ipcheck.elf
Elapsed cycle : 45408
Instruction Frequency Histogram :
Target Register Usage Histogram :
ELF file header
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x100b2
Start of program headers: 52 (bytes into file)
Start of section headers: 69452 (bytes into file)
Flags: 0x1, RVC, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
ELF file size
text data bss dec hex filename
52306 1876 1528 55710 d99e ipcheck.elf
Elapsed cycle : 45403
Instruction Frequency Histogram :
Target Register Usage Histogram :
ELF file header
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x1061a
Start of program headers: 52 (bytes into file)
Start of section headers: 69860 (bytes into file)
Flags: 0x1, RVC, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
ELF file size
text data bss dec hex filename
53528 1876 1528 56932 de64 ipcheck.elf
Elapsed cycle : 45174
Instruction Frequency Histogram :
Target Register Usage Histogram :
ELF file header
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x1061a
Start of program headers: 52 (bytes into file)
Start of section headers: 69860 (bytes into file)
Flags: 0x1, RVC, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
ELF file size
text data bss dec hex filename
53528 1876 1528 56932 de64 ipcheck.elf
Elapsed cycle : 45174
Instruction Frequency Histogram :
Target Register Usage Histogram :
ELF file header
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x1017e
Start of program headers: 52 (bytes into file)
Start of section headers: 69452 (bytes into file)
Flags: 0x1, RVC, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
ELF file size
text data bss dec hex filename
52248 1876 1528 55652 d964 ipcheck.elf
Elapsed cycle : 45438
Instruction Frequency Histogram :
Target Register Usage Histogram :
Rv32emu and Ripes may not work together, therefore we have to rewrite the assembly code before we compiler it. We can check the docs/syscall.md and src/syscall.c in advance.
For example:
We have to change exit
system call from
li a7, 10 # exit program
ecall
into
li a0, 0
li a7, 93 # exit program
ecall
The code below is my rewrited assembly code :
.org 0
.global main
# New syscall
.set SYSEXIT, 93
.set SYSWRITE, 64
.data
routes:
.word 0x0A000000, 8, 0
.word 0x0A010000, 16, 0
.word 0x0A010100, 24, 0
.word 0xC0A80000, 16, 0
ips:
.word 0x0A010137
.word 0x0A0000FF
.word 0xC0A80001
num_routes:
.word 4
num_ips: .word 3
nextline:
.ascii "\n"
.set str_next_len, .-nextline
best_match_ip:
.word 0
best_match_prefix:
.word 0
dot:
.ascii "."
.set str_dot_len, .-dot
slash:
.ascii "/"
.set str_slash_len, .-slash
.text
main:
# Load base addresses of routes, ips, num_routes, and num_ips into registers
la a1, routes
la t1, ips
la t0, num_routes
lw a2, 0(t0)
la t0, num_ips
lw a3, 0(t0)
# Loop through each IP and find the best match
ip_loop:
beqz a3, exit # If no more IPs, exit the loop
lw a0, 0(t1) # ips Load the current IP into a0
j find_best_match # Call the find_best_match function
exit:
li a7, SYSEXIT
addi a0, x0, 0
ecall
count_leading_zeros:
addi a4, zero, 1
srli s2, t2, 16
bne s2, zero, bs24
addi a4, a4, 16
slli t2, t2, 16
bs24:
srli s2, t2, 24
bne s2, zero, bs28
addi a4, a4, 8
slli t2, t2, 8
bs28:
srli s2, t2, 28
bne s2, zero, bs30
addi a4, a4, 4
slli t2, t2, 4
bs30:
srli s2, t2, 30
bne s2, zero, bs31
addi a4, a4, 2
slli t2, t2, 2
bs31:
srli t2, t2, 31
sub a4, a4, t2
ret
is_prefix_match:
# a0: target_ip
# a1: prefix
# a4: prefix_length
# Calculate the mask based on prefix_length
li t2, 32 # Load 32 into t2
sub t2, t2, a4 # Subtract prefix_length from 32
li t3, 1 # Load 1 into t3
sll t3, t3, t2 # Shift left 1 by the result of the subtraction
addi t3, t3, -1 # Subtract 1 from the result to get a mask with leading zeros
not t2, t3 # Bitwise NOT to get the mask
# Apply the mask to target_ip
and t2, a0, t2 # AND operation between target_ip and mask
# Compare the result with prefix
li a5, 0 # Set return value to 0 (no match)
beq t2, t0, match # If they are equal, it's a match
ret
match:
li a5, 1 # Set return value to 1 (match)
ret
find_best_match:
# a0: target_ip
# a1: base address of routes
# a2: num_routes
# Initialize best_match_index, max_leading_zeros, and route_index
li t4, -1 # t4 will hold best_match_index
li t5, 0 # t5 will hold max_leading_zeros
li t6, 0 # t6 will hold route_index (number of routes traversed)
li s3, 0 # s3 will hold best_match_ip
li s4, 0 # s4 will hold best_match_prefix
loop_routes:
beqz a2, end_loop # If no more routes, end the loop
# Load prefix and prefix_length from routes
lw t0, 0(a1) # t0 holds ip_prefix
lbu a4, 4(a1) # a3 holds prefix_length
# Check if prefix matches
jal ra, is_prefix_match
beqz a5, skip_route # If no match, skip to next route
# Calculate leading zeros
xor t2, a0, t0 # XOR target_ip and ip_prefix
jal ra, count_leading_zeros
# Check if current route has more leading zeros than previous best
blt t5, a4, update_best
# If you reach here, it means the route matched but was not the best match
# So, you can skip to the next route
j skip_route
skip_route:
addi a1, a1, 12 # Move to the next route entry
addi a2, a2, -1 # Decrement num_routes
addi t6, t6, 1 # Increment route_index
j loop_routes
update_best:
mv t5, a4 # Update max_leading_zeros
mv t4, t6 # Update best_match_index with current route_index
lw s3, 0(a1) # Save the best match IP from routes
lbu s4, 4(a1) # Save the best match prefix from routes
j skip_route
print_ip:
# s3 contains the IP to be printed
# Extract and print the first octet
srli s5, s3, 24
andi s5, s5, 0xFF
addi sp, sp, -16
sw s5, 0(sp)
sw a1, 4(sp)
sw a2, 8(sp)
sw ra, 12(sp)
addi a1, sp, 0
li a7, SYSWRITE
li a0, 1
li a2, 4
ecall
# Print a dot
li a7, SYSWRITE
li a0, 1
la a1, dot
li a2, str_dot_len
ecall
# Extract and print the second octet
srli s5, s3, 16
andi s5, s5, 0xFF
sw s5, 0(sp)
addi a1, sp, 0
li a7, SYSWRITE
li a0, 1
li a2, 4
ecall
li a7, SYSWRITE
li a0, 1
la a1, dot
li a2, str_dot_len
ecall
srli s5, s3, 8 # Extract and print the third octet
andi s5, s5, 0xFF
sw s5, 0(sp)
addi a1, sp, 0
li a7, SYSWRITE
li a0, 1
li a2, 4
ecall
# Print a dot
li a7, SYSWRITE
li a0, 1
la a1, dot
li a2, str_dot_len
ecall
andi s5, s3, 0xFF # Extract and print the fourth octet
sw s5, 0(sp)
addi a1, sp, 0
li a7, SYSWRITE
li a0, 1
li a2, 4
ecall
lw a1, 4(sp)
lw a2, 8(sp)
lw ra, 12(sp)
addi sp, sp, 16
ret
end_loop:
mv a0, s3 # Print the best match IP
jal ra, print_ip
addi sp, sp, -12
sw a1, 4(sp)
sw a2, 8(sp)
li a7, SYSWRITE
li a0, 1
la a1, slash
li a2, str_slash_len
ecall
sw s4, 0(sp)
addi a1, sp, 0
li a7, SYSWRITE
li a0, 1
li a2, 4
ecall
li a7, SYSWRITE
li a0, 1
la a1, nextline
li a2, str_next_len
ecall
lw a1, 4(sp)
lw a2, 8(sp)
addi sp, sp, 12
li a2, 4
addi t1, t1, 4 # Move to the next IP
addi a3, a3, -1 # Decrement the IP count
la a1, routes
j ip_loop
But there are some trouble with displaying numbers when I excute the program. The numbers can't display correctly.
.../
.../
. ../
inferior exit code 0
After googling, this article provide some methods to solve this problem. We can use printf
commond to display integer. Thus, I rewrite the assembly code into printf version again.
.org 0
.global main
# New syscall
.set SYSEXIT, 93
.set SYSWRITE, 64
.data
routes:
.word 0x0A000000, 8, 0
.word 0x0A010000, 16, 0
.word 0x0A010100, 24, 0
.word 0xC0A80000, 16, 0
ips:
.word 0x0A010137
.word 0x0A0000FF
.word 0xC0A80001
iformat:
.string "%d"
num_routes:
.word 4
num_ips: .word 3
nextline:
.ascii "\n"
.set str_next_len, .-nextline
best_match_ip:
.word 0
best_match_prefix:
.word 0
dot:
.ascii "."
.set str_dot_len, .-dot
slash:
.ascii "/"
.set str_slash_len, .-slash
.text
main:
# Load base addresses of routes, ips, num_routes, and num_ips into registers
la a1, routes
la t1, ips
la t0, num_routes
lw a2, 0(t0)
la t0, num_ips
lw a3, 0(t0)
# Loop through each IP and find the best match
ip_loop:
beqz a3, exit # If no more IPs, exit the loop
lw a0, 0(t1) # ips Load the current IP into a0
j find_best_match # Call the find_best_match function
exit:
li a7, SYSEXIT
addi a0, x0, 0
ecall
count_leading_zeros:
addi a4, zero, 1
srli s2, t2, 16
bne s2, zero, bs24
addi a4, a4, 16
slli t2, t2, 16
bs24:
srli s2, t2, 24
bne s2, zero, bs28
addi a4, a4, 8
slli t2, t2, 8
bs28:
srli s2, t2, 28
bne s2, zero, bs30
addi a4, a4, 4
slli t2, t2, 4
bs30:
srli s2, t2, 30
bne s2, zero, bs31
addi a4, a4, 2
slli t2, t2, 2
bs31:
srli t2, t2, 31
sub a4, a4, t2
ret
is_prefix_match:
# a0: target_ip
# a1: prefix
# a4: prefix_length
# Calculate the mask based on prefix_length
li t2, 32 # Load 32 into t2
sub t2, t2, a4 # Subtract prefix_length from 32
li t3, 1 # Load 1 into t3
sll t3, t3, t2 # Shift left 1 by the result of the subtraction
addi t3, t3, -1 # Subtract 1 from the result to get a mask with leading zeros
not t2, t3 # Bitwise NOT to get the mask
# Apply the mask to target_ip
and t2, a0, t2 # AND operation between target_ip and mask
# Compare the result with prefix
li a5, 0 # Set return value to 0 (no match)
beq t2, t0, match # If they are equal, it's a match
ret
match:
li a5, 1 # Set return value to 1 (match)
ret
find_best_match:
# a0: target_ip
# a1: base address of routes
# a2: num_routes
# Initialize best_match_index, max_leading_zeros, and route_index
li t4, -1 # t4 will hold best_match_index
li t5, 0 # t5 will hold max_leading_zeros
li t6, 0 # t6 will hold route_index (number of routes traversed)
li s3, 0 # s3 will hold best_match_ip
li s4, 0 # s4 will hold best_match_prefix
loop_routes:
beqz a2, end_loop # If no more routes, end the loop
# Load prefix and prefix_length from routes
lw t0, 0(a1) # t0 holds ip_prefix
lbu a4, 4(a1) # a3 holds prefix_length
# Check if prefix matches
jal ra, is_prefix_match
beqz a5, skip_route # If no match, skip to next route
# Calculate leading zeros
xor t2, a0, t0 # XOR target_ip and ip_prefix
jal ra, count_leading_zeros
# Check if current route has more leading zeros than previous best
blt t5, a4, update_best
# If you reach here, it means the route matched but was not the best match
# So, you can skip to the next route
j skip_route
skip_route:
addi a1, a1, 12 # Move to the next route entry
addi a2, a2, -1 # Decrement num_routes
addi t6, t6, 1 # Increment route_index
j loop_routes
update_best:
mv t5, a4 # Update max_leading_zeros
mv t4, t6 # Update best_match_index with current route_index
lw s3, 0(a1) # Save the best match IP from routes
lbu s4, 4(a1) # Save the best match prefix from routes
j skip_route
print_ip:
# s3 contains the IP to be printed
# Extract and print the first octet
srli s5, s3, 24
andi s5, s5, 0xFF
addi sp, sp, -16
sw s5, 0(sp)
sw a1, 4(sp)
sw a2, 8(sp)
sw ra, 12(sp)
la a0, iformat
lw a1, 0(sp)
call printf
# Print a dot
li a7, SYSWRITE
li a0, 1
la a1, dot
li a2, str_dot_len
ecall
# Extract and print the second octet
srli s5, s3, 16
andi s5, s5, 0xFF
sw s5, 0(sp)
la a0, iformat
lw a1, 0(sp)
call printf
li a7, SYSWRITE
li a0, 1
la a1, dot
li a2, str_dot_len
ecall
srli s5, s3, 8 # Extract and print the third octet
andi s5, s5, 0xFF
sw s5, 0(sp)
la a0, iformat
lw a1, 0(sp)
call printf
# Print a dot
li a7, SYSWRITE
li a0, 1
la a1, dot
li a2, str_dot_len
ecall
andi s5, s3, 0xFF # Extract and print the fourth octet
sw s5, 0(sp)
la a0, iformat
lw a1, 0(sp)
call printf
lw a1, 4(sp)
lw a2, 8(sp)
lw ra, 12(sp)
addi sp, sp, 16
ret
end_loop:
mv a0, s3 # Print the best match IP
jal ra, print_ip
addi sp, sp, -16
sw a1, 4(sp)
sw a2, 8(sp)
sw ra, 12(sp)
li a7, SYSWRITE
li a0, 1
la a1, slash
li a2, str_slash_len
ecall
sw s4, 0(sp)
la a0, iformat
lw a1, 0(sp)
call printf
li a7, SYSWRITE
li a0, 1
la a1, nextline
li a2, str_next_len
ecall
lw a1, 4(sp)
lw a2, 8(sp)
lw ra, 12(sp)
addi sp, sp, 16
li a2, 4
addi t1, t1, 4 # Move to the next IP
addi a3, a3, -1 # Decrement the IP count
la a1, routes
j ip_loop
However, it still can't print the result correctly. Rv32emu seems not support printf
commond. Therefore, I decide to use the first assembly code to generate ELF file and analyze it.
ELF file header
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x100c2
Start of program headers: 52 (bytes into file)
Start of section headers: 16304 (bytes into file)
Flags: 0x1, RVC, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 14
Section header string table index: 13
ELF file size
text data bss dec hex filename
6444 1452 840 8736 2220 ipcheck.elf
cycle count : 666
Instruction Frequency Histogram :
Target Register Usage Histogram :
Comparing handwriting assemble with assembly code generated by riscv-none-elf-gcc.
Handwriting | O0 | O1 | O2 | O3 | Os | Ofast | |
---|---|---|---|---|---|---|---|
Size | 6444 | 52966 | 52274 | 52306 | 53528 | 52248 | 53528 |
We can notice there is a tremendous difference in size between handwriting assembly code and generated code. Furthermore, Os optimized has smallest among the optimization levels.
Specifies the degree to which the generated code is optimized for speed and binary size.
The different among the optimization level are shown below.
None[-O0]
Do not optimize. With this setting, the compiler's goal is to reduce the cost of compilation and to make debugging produce the expected results.
Fast[-O1]
Optimizing compilation takes somewhat more time, and a lot more memory for a large function. With this setting, the compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time
Faster[-O2]
The compiler performs nearly all supported optimizations that do not involve a space-speed tradeoff. With this setting, the compiler does not perform loop unrolling or function inlining, or register renaming. As compared to the 'Fast' setting, this setting increases both compilation time and the performance of the generated code.
Fastest[-O3]
Turns on all optimizations specified by the 'Faster' setting and also turns on function inlining and register renaming options. This setting may result in a larger binary.
Fastest, Smallest[-Os]
Optimize for size. This setting enables all 'Faster' optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.
Fastest, Aggressive Optimizations[-Ofast]
This setting enables 'Fastest' but also enables aggressive optimizations that may break strict standards compliance but should work well on well-behaved code.
We can also compare the cycles among different optimization level
Handwriting | O0 | O1 | O2 | O3 | Os | Ofast | |
---|---|---|---|---|---|---|---|
cycles | 666 | 46375 | 45408 | 45403 | 45174 | 45438 | 45174 |
We can notice that handwriting assembly code still use less cycles and Ofast optimized has smallest cycles among the optimization levels. However, as the optimization level decrease, it will use less cycles to excute the program. But the cycles of Os is worse than others. But there
are only slight difference among the six optimization level
Furthermore, I found whichever optimization level I chose, the text size and cycles of handwriting assembly code have always been the same.
I am interested in which C instruction spend the most cycles. After trying, I found printf
spend the majority of the cycles. Thus, I decide to remove all printf
in my C code and count the cycles. Without printf
the cycles reduce obviously, and the results are shown below.
O0 | O1 | O2 | O3 | Os | Ofast | |
---|---|---|---|---|---|---|
Count cycles | 1135 | 356 | 279 | 7 | 429 | 7 |