# Assignment1: RISC-V Assembly and Instruction Pipeline contributed by < [`SUE3k`](https://github.com/SUE3K/computer_architecture_hw1) > ## Find the position of MSB by CLZ We can find the position of MSB (Most Significant Bit) using CLZ (Count Leading Zeros) due to the following reasons: 1. CLZ provides a straightforward way to determine the MSB without the need for complex bitwise shifting and comparison operations. It simplifies the process of finding the most significant bit in a number. 2. CLZ efficiently finds the leftmost set bit, making it valuable for various applications across different platforms. ### Motivation When I learned that we were going to apply the Count Leading Zeros (CLZ) operation, my initial idea was that it had a connection with finding the Most Significant Bit (MSB). When we analyzing a binary digit, determining it MSB is not always straightforward. Instead, we often need to count the number of zeros preceding it one by one. It's quite non-intuitive and complex. Therefore, we can intuitively calculate CLZ first and then use that information to find the position of the MSB in a number. So we can immediately know what bit is the MSB. ### C program ```c #include <stdio.h> #include <stdint.h> uint16_t count_exponent(uint64_t x) { x |= (x >> 1); x |= (x >> 2); x |= (x >> 4); x |= (x >> 8); x |= (x >> 16); x |= (x >> 32); /* 計算二進制表示中1的個數,以找到前導零數 */ x -= ((x >> 1) & 0x5555555555555555); x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333); x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f; x += (x >> 8); x += (x >> 16); x += (x >> 32); // 直接返回計算的前導零數 return (32 - (x & 0x7f)); } int main() { uint64_t test_data[] = {0x00000011, 0x00001101, 0x00010011}; for (int i = 0; i < sizeof(test_data) / sizeof(test_data[0]); i++) { uint32_t clz = count_exponent(test_data[i]); if (clz < 32) { uint32_t msb = (uint32_t)(31 - clz); printf("Test Data %d:\n", i + 1); printf("Input: 0x%016llX\n", test_data[i]); printf("Leading Zeros: %u\n", clz); printf("MSB: %u\n", msb); } else { printf("Test Data %d: Invalid input, cannot calculate MSB.\n", i + 1); } printf("\n"); } return 0; } ``` ### Assembly version2 include loops (or recursive calls) and conditional branches. ```c .data test1: .word 17 test2: .word 4353 test3: .word 65553 newline: .string "\n" printf1: .string "leading zeros:" printf2: .string "MSB:" .text counter: li s7, 0 # intial counter li s9, 1 li s8, 0 jal loop loop: beq s7, x0, load_test1 #s7:0->1 beq s7, s9, load_test2 #s7:1->2 beq s8, s9, exit #avoid bge s7, s9, load_test3 #s7:2->3 main: jal counting_process li t4, 31 sub s6, t4, s5 addi s7, s7, 1 jal result load_test1: lw s0, test1 jal ra, main load_test2: lw s0, test2 jal ra, main load_test3: lw s0, test3 addi s8, s8, 1 jal ra, main result: la a0, printf1 li a7, 4 ecall mv a0, s5 li a7, 1 #print n ecall la a0, newline li a7, 4 ecall la a0, printf2 li a7, 4 ecall mv a0, s6 li a7, 1 ecall la a0, newline li a7, 4 ecall jal loop ret exit: li a7, 10 ecall counting_process: #x |= (x >> 1); #x |= (x >> 2); #x |= (x >> 4); #x |= (x >> 8); #x |= (x >> 16); #x |= (x >> 32); srli s1, s0, 1 or s0, s1, s0 srli s1, s0, 2 or s0, s1, s0 srli s1, s0, 4 or s0, s1, s0 srli s1, s0, 8 or s0, s1, s0 srli s1, s0, 16 or s0, s1, s0 #x -= ((x >> 1) & 0x5555555555555555); srli s1, s0, 1 #x>>1 li s2, 0x55555555 and s1, s1, s2 #x>>1 & 0x55.. sub s0, s0, s1 #x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333); srli s1, s0, 2 li s2, 0x33333333 and s1, s1, s2 and s3, s0, s2 add s0, s1, s3 #x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f; srli s1, s0, 4 add s1, s0, s1 li s2, 0x0f0f0f0f and s0, s1, s2 #x += (x >> 8); srli s1, s0, 8 add s0, s0, s1 #x += (x >> 16); srli s1, s0, 16 add s0, s0, s1 #x += (x >> 32); li s1, 0 #srli 32bit = 0 add s0, s0, s1 li t1, 0x0000007f and s1, s0, t1 sub s5, x0, s1 #n=s5 addi s5, s5, 32 ret ``` There are 3 test data in this program: * test1: .word 17 * test2: .word 4353 * test3: .word 65553 Correct answer ![](https://hackmd.io/_uploads/HkhRWHeWT.png) # Analysis & Pipeline Generated from Ripes The 5-stage in-order processor with hazard detection/elimination and forwarding gets correct results on each of test data: ![](https://hackmd.io/_uploads/rk-PC_pl6.png) ``` counter: li s7, 0 # intial counter ``` ## IF stage ![](https://hackmd.io/_uploads/ryXtkY6ea.png) * That observe the `addi x23 x0 0` instruction which is at address `0x00000000` * Program Counter is `0x00000004`, which refers to the next instruction address. * If without any branching occured ,`PC` is supposed to be `IR`+4 * And input of Instruction memory is `0x00000000`. * Compressed decoder output the hex instruction `0x00000b93` ## ID stage ![](https://hackmd.io/_uploads/r1s73jTea.png) * The `addi x23 x0 0` is the I-type instruction. > | IMM | RD | OP | | -------- | -------- | -------- | | 00000000000000000000 | 10111 | 0010011 | * Processor can get OP code `0010011` from Decode and decoding the `addi` instruction. * The desstination register is `x23`. ## EXE stage ![](https://hackmd.io/_uploads/ryRDCh1Zp.png) * Here we use OP1 and OP2 to implement `addi x23 x0 0` ## MEM stage ![](https://hackmd.io/_uploads/SkpHbpyW6.png) * This is an instruction that loads an immediate value doesn't involve memory access. ## WB stage ![](https://hackmd.io/_uploads/rkhgZEl-a.png) * At this stage, the result is written back into register x23, i.e. zero is written into register x23. * The pseudo-instruction "li s1, 0" will actually be converted into a basic instruction. This basic instruction will be executed in the pipeline, and each stage will be processed once. The pseudo-instruction here is converted into an instruction to load an immediate value, so it will take multiple clock cycles in the pipeline to complete. ## CPU analysis without loops ![](https://hackmd.io/_uploads/B1QGN4g-6.png) include loops ![](https://hackmd.io/_uploads/S1tvNElWp.png)