Assignment1: RISC-V Assembly and Instruction Pipeline

# Assignment1: RISC-V Assembly and Instruction Pipeline contributed by < [`fan1071221`](https://github.com/fan1071221/Computer_Architecture_2023_hw1) > ## **A Novel Approach to IP Routing: CLZ for Prefix Match** > Description: > In the current networking landscape, when multiple prefixes overlap, certain addresses might match with several prefixes. Given the ==32-bit structure of IPv4 addresses==, there's a potential for a vast array of prefix combinations in the routing tables. "Count Leading Zero" (CLZ) introduces a unique method, allowing for the calculation of consecutive zeros starting from the most significant bit of a binary number. My primary motivation is to explore how CLZ can be adeptly combined with other existing techniques to tackle the issue of overlapping prefixes in routing tables. > Overlapping Prefixes in Routing Tables >- **Issue**: Prefixes in the routing table can overlap. >- **Implication**: Some addresses might match multiple prefixes. ## **Algorithm Steps** ### **1. Define Data Structure** Define a structure named RouteEntry that contains the IP prefix, prefix length, and the number of leading zeros. ```c typedef struct { uint32_t ip_prefix; uint8_t prefix_length; uint16_t leading_zeros; } RouteEntry; ``` ### **2. Check if IP Matches with the Prefix** Based on the given prefix length, compute the corresponding subnet mask. Use the ==subnet mask and the target IP to perform an AND operation==, and check if the result matches the prefix. ### **3. Calculate the Number of Leading Zeros** If the target IP matches the prefix of the route entry, I use bitwise operations to calculate the number of leading zeros in the ==XOR result between the target IP and the IP prefix of the entry.== ### **4. Find the Best Match** Initialize the best match index as -1 and the maximum number of leading zeros as 0. Traverse all route entries, and for each entry: Check if the target IP matches the prefix of the entry. If it matches, calculate the number of leading zeros. If the number of leading zeros for the entry is greater than the current maximum, update the best match index and the maximum number of leading zeros. ### **5. Execute the Main Function** Define a set of route entries and a set of target IPs, then traverse each target IP and find its best matching route entry. ## **Example** ### **1. Define Data Structure** We have a `RouteEntry` structure that contains: - `ip_prefix`: The starting IP of the route. - `prefix_length`: The number of bits that define the network portion of the IP. - `leading_zeros`: The number of leading zeros in the XOR result between the target IP and the IP prefix. Example: ```c RouteEntry route = {0x0A010000, 16, 0}; #This represents the route 10.1.0.0/16. ``` ### **2. Check if IP Matches with the Prefix** Suppose our target IP is `10.1.1.55` which in hexadecimal is `0x0A010137`. For the route `10.1.0.0/16`, the subnet mask is `255.255.0.0`. When we perform an AND operation between the target IP and the subnet mask, we get `10.1.0.0` which matches the prefix of the route. ### **3. Calculate the Number of Leading Zeros** To calculate the number of leading zeros between the target IP and the route prefix, we XOR them: 10.1.1.55 (0x0A010137) XOR 10.1.0.0 (0x0A010000) = 0x00000137 The binary representation of `0x00000137` is `00000000000000000000000100110111`. The number of leading zeros is 23. ### **4. Find the Best Match** Let's say we have the following routes: - 10.0.0.0/8 - 10.1.0.0/16 - 10.1.1.0/24 - 192.168.0.0/16 For the target IP `10.1.1.55`: - It matches with `10.0.0.0/8` with 15 leading zeros. - It matches with `10.1.0.0/16` with 23 leading zeros. - It matches with `10.1.1.0/24` with 31 leading zeros. The route `10.1.1.0/24` has the highest number of leading zeros (31), so it's the best match for the target IP `10.1.1.55`. ## **Implementation** ### **C program** ```c #include <stdio.h> #include <stdint.h> typedef struct { uint32_t ip_prefix; uint8_t prefix_length; uint16_t leading_zeros; } RouteEntry; uint16_t count_leading_zeros(uint32_t value) { uint32_t x = value; x |= (x >> 1); x |= (x >> 2); x |= (x >> 4); x |= (x >> 8); x |= (x >> 16); x -= ((x >> 1) & 0x55555555); x = ((x >> 2) & 0x33333333) + (x & 0x33333333); x = ((x >> 4) + x) & 0x0f0f0f0f; x += (x >> 8); x += (x >> 16); return (32 - (x & 0x3f)); } int is_prefix_match(uint32_t target_ip, uint32_t prefix, uint8_t prefix_length) { uint32_t mask = ~((1 << (32 - prefix_length)) - 1); return (target_ip & mask) == prefix; } void find_best_match(uint32_t target_ip, RouteEntry* routes, int num_routes) { int best_match_index = -1; uint16_t max_leading_zeros = 0; printf("Target IP: %u.%u.%u.%u\n", (target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF); printf("CLZ values for each route:\n"); for (int i = 0; i < num_routes; i++) { if (is_prefix_match(target_ip, routes[i].ip_prefix, routes[i].prefix_length)) { uint32_t xor_result = target_ip ^ routes[i].ip_prefix; routes[i].leading_zeros = count_leading_zeros(xor_result); printf("Route %u.%u.%u.%u/%u: CLZ = %u\n", (routes[i].ip_prefix >> 24) & 0xFF, (routes[i].ip_prefix >> 16) & 0xFF, (routes[i].ip_prefix >> 8) & 0xFF, routes[i].ip_prefix & 0xFF, routes[i].prefix_length, routes[i].leading_zeros); if (routes[i].leading_zeros > max_leading_zeros) { max_leading_zeros = routes[i].leading_zeros; best_match_index = i; } } } if (best_match_index != -1) { printf("Best matching route for IP %u.%u.%u.%u is: %u.%u.%u.%u/%u\n\n", (target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF, (routes[best_match_index].ip_prefix >> 24) & 0xFF, (routes[best_match_index].ip_prefix >> 16) & 0xFF, (routes[best_match_index].ip_prefix >> 8) & 0xFF, routes[best_match_index].ip_prefix & 0xFF, routes[best_match_index].prefix_length); } else { printf("No matching route found for IP %u.%u.%u.%u\n\n", (target_ip >> 24) & 0xFF, (target_ip >> 16) & 0xFF, (target_ip >> 8) & 0xFF, target_ip & 0xFF); } } int main() { RouteEntry routes[] = { {0x0A000000, 8, 0}, // 10.0.0.0/8 {0x0A010000, 16, 0}, // 10.1.0.0/16 {0x0A010100, 24, 0}, // 10.1.1.0/24 {0xC0A80000, 16, 0} // 192.168.0.0/16 }; int num_routes = sizeof(routes) / sizeof(routes[0]); uint32_t ips[] = { 0x0A010137, // 10.1.1.55 0x0A0000FF, // 10.0.0.255 0xC0A80001 // 192.168.0.1 }; int num_ips = sizeof(ips) / sizeof(ips[0]); for (int i = 0; i < num_ips; i++) { find_best_match(ips[i], routes, num_routes); } return 0; } ``` ### **C program Ouput** ![](https://hackmd.io/_uploads/ByVpl0kWp.png) ### **Assembly** ```c .data routes: .word 0x0A000000, 8, 0 .word 0x0A010000, 16, 0 .word 0x0A010100, 24, 0 .word 0xC0A80000, 16, 0 ips: .word 0x0A010137 .word 0x0A0000FF .word 0xC0A80001 num_routes: .word 4 num_ips: .word 3 newline: .string "\n" best_match_ip: .word 0 best_match_prefix: .word 0 dot: .string "." slash: .string "/" .text .globl main main: # Load base addresses of routes, ips, num_routes, and num_ips into registers la a1, routes la t1, ips la t0, num_routes lw a2, 0(t0) la t0, num_ips lw a3, 0(t0) # Loop through each IP and find the best match ip_loop: beqz a3, exit # If no more IPs, exit the loop lw a0, 0(t1) # ips Load the current IP into a0 #li a7, 1 # System call for print integer #ecall j find_best_match # Call the find_best_match function #addi t1, t1, 4 # Move to the next IP #addi a3, a3, -1 # Decrement the IP count #j ip_loop exit: # Exit the program li a7, 10 ecall count_leading_zeros: li a4, 32 srli s2, t2, 1 or a6, t2, s2 srli s2, a6, 2 or a6, a6, s2 srli s2, a6, 4 or a6, a6, s2 srli s2, a6, 8 or a6, a6, s2 srli s2, a6, 16 or a6, a6, s2 li s2, 0x55555555 srli t2, a6, 1 and t2, t2, s2 sub a6, a6, t2 li s2, 0x33333333 and t2, a6, s2 srli t3, a6, 2 and t3, t3, s2 add a6, t2, t3 li s2, 0x0f0f0f0f srli t2, a6, 4 add t2, t2, a6 and a6, t2, s2 srli s2, a6, 8 add a6, a6, s2 srli s2, a6, 16 add a6, a6, s2 li s2, 0x7f and a6, a6, s2 sub a4, a4, a6 ret is_prefix_match: # a0: target_ip # a1: prefix # a4: prefix_length # Calculate the mask based on prefix_length li t2, 32 # Load 32 into t2 sub t2, t2, a4 # Subtract prefix_length from 32 li t3, 1 # Load 1 into t3 sll t3, t3, t2 # Shift left 1 by the result of the subtraction addi t3, t3, -1 # Subtract 1 from the result to get a mask with leading zeros not t2, t3 # Bitwise NOT to get the mask # Apply the mask to target_ip and t2, a0, t2 # AND operation between target_ip and mask # Compare the result with prefix li a5, 0 # Set return value to 0 (no match) beq t2, t0, match # If they are equal, it's a match ret match: li a5, 1 # Set return value to 1 (match) ret find_best_match: # a0: target_ip # a1: base address of routes # a2: num_routes # Initialize best_match_index, max_leading_zeros, and route_index li t4, -1 # t4 will hold best_match_index li t5, 0 # t5 will hold max_leading_zeros li t6, 0 # t6 will hold route_index (number of routes traversed) li s3, 0 # s3 will hold best_match_ip li s4, 0 # s4 will hold best_match_prefix loop_routes: beqz a2, end_loop # If no more routes, end the loop # Load prefix and prefix_length from routes lw t0, 0(a1) # t0 holds ip_prefix lbu a4, 4(a1) # a3 holds prefix_length # Check if prefix matches jal ra, is_prefix_match beqz a5, skip_route # If no match, skip to next route # Calculate leading zeros xor t2, a0, t0 # XOR target_ip and ip_prefix jal ra, count_leading_zeros # Check if current route has more leading zeros than previous best blt t5, a4, update_best # If you reach here, it means the route matched but was not the best match # So, you can skip to the next route j skip_route skip_route: addi a1, a1, 12 # Move to the next route entry addi a2, a2, -1 # Decrement num_routes addi t6, t6, 1 # Increment route_index j loop_routes update_best: mv t5, a4 # Update max_leading_zeros mv t4, t6 # Update best_match_index with current route_index lw s3, 0(a1) # Save the best match IP from routes lbu s4, 4(a1) # Save the best match prefix from routes j skip_route print_ip: # s3 contains the IP to be printed # Extract and print the first octet srli s5, s3, 24 andi s5, s5, 0xFF mv a0, s5 li a7, 1 ecall # Print a dot la a0, dot li a7, 4 ecall # Extract and print the second octet srli s5, s3, 16 andi s5, s5, 0xFF mv a0, s5 li a7, 1 ecall # Print a dot la a0, dot li a7, 4 ecall # Extract and print the third octet srli s5, s3, 8 andi s5, s5, 0xFF mv a0, s5 li a7, 1 ecall # Print a dot la a0, dot li a7, 4 ecall # Extract and print the fourth octet andi s5, s3, 0xFF mv a0, s5 li a7, 1 ecall ret end_loop: # Print the best match IP mv a0, s3 jal ra, print_ip la a0, slash li a7, 4 ecall # Print the best match prefix mv a0, s4 li a7, 1 ecall la a0, newline li a7, 4 ecall li a2, 4 addi t1, t1, 4 # Move to the next IP addi a3, a3, -1 # Decrement the IP count la a1, routes j ip_loop ``` ### **Assembly Ouput** ![](https://hackmd.io/_uploads/Bk8ueCkWp.png) ## **Analysis** The code was tested using the [Ripes](https://github.com/mortbopet/Ripes) simulator. Here are the line numbers used for the CLZ function of the disassenbled code. Detailed Analysis of ==srli x18 x7 1== in RISC-V Pipeline ``` 0000003c <count_leading_zeros>: 3c: 02000713 addi x14 x0 32 40: 0013d913 srli x18 x7 1 44: 0123e833 or x16 x7 x18 48: 00285913 srli x18 x16 2 4c: 01286833 or x16 x16 x18 50: 00485913 srli x18 x16 4 54: 01286833 or x16 x16 x18 58: 00885913 srli x18 x16 8 5c: 01286833 or x16 x16 x18 60: 01085913 srli x18 x16 16 64: 01286833 or x16 x16 x18 68: 55555937 lui x18 0x55555 6c: 55590913 addi x18 x18 1365 70: 00185393 srli x7 x16 1 74: 0123f3b3 and x7 x7 x18 78: 40780833 sub x16 x16 x7 7c: 33333937 lui x18 0x33333 80: 33390913 addi x18 x18 819 84: 012873b3 and x7 x16 x18 88: 00285e13 srli x28 x16 2 8c: 012e7e33 and x28 x28 x18 90: 01c38833 add x16 x7 x28 94: 0f0f1937 lui x18 0xf0f1 98: f0f90913 addi x18 x18 -241 9c: 00485393 srli x7 x16 4 a0: 010383b3 add x7 x7 x16 a4: 0123f833 and x16 x7 x18 a8: 00885913 srli x18 x16 8 ac: 01280833 add x16 x16 x18 b0: 01085913 srli x18 x16 16 b4: 01280833 add x16 x16 x18 b8: 07f00913 addi x18 x0 127 bc: 01287833 and x16 x16 x18 c0: 41070733 sub x14 x14 x16 c4: 00008067 jalr x0 x1 0 ``` ### **Introduction** The RISC-V architecture employs a pipeline mechanism to enhance the efficiency of instruction execution. The `srli` instruction, which stands for "Shift Right Logical Immediate," is an I-type instruction. This means it operates on an immediate value and a register operand. In this specific instruction, the value in register `x7` is shifted one bit to the right, and the result is stored in register `x18`. ### **IF (Instruction Fetch) Stage** ![](https://hackmd.io/_uploads/ryX-m5Clp.png) - **Purpose**: The primary goal of this stage is to retrieve the instruction from memory. - **Operation**: - The CPU provides the address `0x00000040` to the instruction memory. - The instruction memory returns the encoded instruction `0x0013d913`. - **Instruction Encoding Explanation**: - The instruction is divided into its constituent fields: `imm[11:0]`, `rs1`, `funct3`, `rd`, and `opcode`. - `imm[11:0]` is the immediate value. For the value `1`, it is represented in binary as `000000000001`. - `rs1` indicates the source register. For `x7`, it is represented in binary as `00111`. - `funct3` differentiates instructions with the same opcode. For `srli`, it is `101`. - `rd` indicates the destination register. For `x18`, it is `10010`. - `opcode` for I-type instructions like `srli` is `0010011`. - Combining these fields, we get the binary representation `000000000001 00111 101 10010 0010011`, which is equivalent to the hexadecimal value `0x0013d913`. - **Next Instruction Calculation**: The address for the subsequent instruction is computed by adding 4 to the current address, resulting in `0x00000044`. ### **ID (Instruction Decode) Stage** ![](https://hackmd.io/_uploads/HyoP99Rla.png) - ### **Purpose**: The ID stage is responsible for decoding the fetched instruction to understand its semantics, the operations it intends to perform, and the registers it interacts with. - ### **Decoding Process**: 1. **Instruction Analysis**: - The instruction `srli x18, x7, 1` is recognized as an I-type instruction in the RISC-V instruction set. This means it operates on a register and an immediate value. 2. **Register Fetch**: - The source register `rs1` is identified as `x7`. The value in `x7` at this stage is `0x00010137`. - The destination register `rd` is identified as `x18`. The initial value in `x18` before the operation is `0x00000000`. 3. **Immediate Value Extraction**: - The immediate value for this instruction is `1`. This value indicates the number of positions the value in `x7` will be shifted to the right. 4. **Instruction Semantics**: - The `srli` instruction stands for "Shift Right Logical Immediate". It will shift the bits of the value in the source register (`x7`) to the right by the number specified in the immediate value. In this case, it will shift the bits of `0x00010137` one position to the right. 5. **Preparation for Execution**: - With the source register value, destination register, and immediate value identified, the instruction is now ready to be passed to the Execute (EX) stage for the actual operation. ### **EX (Execute) Stage** ![](https://hackmd.io/_uploads/SJE4T50g6.png) - **Purpose**: The EX stage is where the actual computation or operation specified by the instruction takes place. For the `srli` instruction, this involves a logical right shift. - **Operation**: - **Input to ALU**: - The ALU (Arithmetic Logic Unit) receives the value from the `x7` register, which at this stage is `0x00010137`. - The immediate value `1` is also fed into the ALU, indicating the number of positions the value in `x7` should be shifted to the right. - **ALU Operation**: - The ALU performs the `srli` operation, which is a logical right shift. This means that the bits of the value in `x7` are shifted one position to the right, and a `0` is inserted at the leftmost position. - For the value `0x00010137`, the result after the right shift by one position is `0x0000809B`. - **Result Storage**: - The result of the ALU operation is stored in the `Res` field, which will be used in subsequent pipeline stages. - This result, `0x0000809B`, will eventually be written back to the `x18` register in the WB (Write Back) stage. ### **MEM (Memory Access) Stage** ![](https://hackmd.io/_uploads/HJhpkjAgp.png) - **Purpose**: This stage is primarily for instructions that need to access data memory. However, for the `srli` instruction, this stage is somewhat passive. - **Operation**: Since `srli` is an arithmetic instruction, no memory access is required. Therefore, the Data Memory output remains `0x00000000`. ### **WB (Write Back) Stage** ![](https://hackmd.io/_uploads/B1j4ejAla.png) - **Purpose**: The result from the previous stages is written back to the register file in this stage. - **Operation**: - The result from the EX stage (`Res`) is written back to the GPR file. - The `x18` register in the GPR file is updated with the shifted value. - With this, the execution of the `srli x18 x7 1` instruction is completed. - **Wr data**: ![](https://hackmd.io/_uploads/SJ-ux4eZp.png) - **Wr idx** ![](https://hackmd.io/_uploads/HyM1z4g-a.png) ### **Memory Update** In the RISC-V pipeline architecture, the results of the `srli x18 x7 1` instruction are written back to the register only after the instruction has completed its execution and has been overlapped by other instructions in the processor. This design ensures sequential execution of instructions while allowing multiple instructions to be processed simultaneously at different pipeline stages, thereby enhancing overall efficiency. ![](https://hackmd.io/_uploads/rJQGiNeZT.png =50%x)![](https://hackmd.io/_uploads/r12Dj4gWa.png =50%x) --- This detailed breakdown provides a comprehensive view of how the `srli x18 x7 1` instruction travels through the RISC-V pipeline, from fetching to writing back the result.