# Assignment1: RISC-V Assembly and Instruction Pipeline contributed by < [`rayChen`](https://github.com/padaray)> ## Implement palindrome detection and using CLZ ### 1. Using Count leading Zeros First, calculate leading zeros, and then subtract it from the bit count of the datatype to determine the actual number of bits used. ### 2. Palindrome detection To detect a palindrome, we need to know the actual number of bits used by the variable and process these bits as follows: - 1. Passing the actual stored bit count as one of the parameters to `palindrome_detected()`function. - 2. Reverse the bits based on the actual stored bits. - 3. Compare the reversed parameter with the input parameter ### 3. Implement Palindrome detection with C ```code= #include <stdint.h> #include <iostream> #include <cstdio> // count how many zeros forwards input number uint16_t count_leading_zeros(uint64_t x) { x |= (x >> 1); x |= (x >> 2); x |= (x >> 4); x |= (x >> 8); x |= (x >> 16); x |= (x >> 32); /* count ones (population count) */ x -= ((x >> 1) & 0x5555555555555555 ); x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333); x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f; x += (x >> 8); x += (x >> 16); x += (x >> 32); return (64 - (x & 0x7f)); } bool palindrome_detected(uint64_t x, int clz){ /* tempX = left half of input x (use to reverse and check) */ uint64_t nob = (64 - clz); uint64_t checkEven = nob % 2; uint64_t tempX = (x >> (nob / 2)); tempX = (tempX >> checkEven); printf("tempX1 = %llx\n", tempX); /* tempY = right half of input x */ uint64_t leftShiftNum = (nob / 2) + checkEven + (64 - nob); printf("leftShiftNum = %d\n", leftShiftNum); uint64_t tempY = (x << leftShiftNum); tempY = (tempY >> leftShiftNum); /* reverse tempX */ uint64_t revTempX = 0x0; uint64_t maskTempX = tempX; for (int i = 0; i < 64; i++) { revTempX = (revTempX << 1); revTempX |= maskTempX & 0x1; maskTempX = (maskTempX >> 1); } revTempX = (revTempX >> leftShiftNum); printf("revTempX = %llx, tempY = %llx\n", revTempX, tempY); /* check revTempX nd tempY same or not */ if (revTempX == tempY) { return true; } else { return false; } } int main(){ uint64_t testA = 0x0000000000000000; //0 is palindrome uint64_t testB = 0x0000000000000001; //testB not palindrome uint64_t testC = 0x00000C0000000003; //testC is palindrome uint64_t testD = 0x0F000000000000F0; //testD not palindrome printf("%d\n", palindrome_detected(testA, count_leading_zeros(testA))); printf("%d\n", palindrome_detected(testB, count_leading_zeros(testB))); printf("%d\n", palindrome_detected(testC, count_leading_zeros(testC))); printf("%d\n", palindrome_detected(testD, count_leading_zeros(testD))); } ``` ### 4. Assembly Code Assembly Code [`Hyperlink`](https://github.com/padaray/computer_architecture_hw1/blob/main/HW1.s) ### 5. Analysis Use [Ripes](https://github.com/mortbopet/Ripes) to simulate RISC-V processor #### Ripes Simulator Ripes Simulator is an open-source, web-based RISC-V computer architecture simulator designed for educational and research purposes. It utilizes a **5-stage pipeline** architecture. ![](https://hackmd.io/_uploads/Hy_tSbl-p.png) #### 5-stage pipeline 1. **IF**(Instruction Fetch) : CPU reads instruction pipeline from the address in the memory whose value is present in the program counter(PC) 3. **ID**(Instruction Decode) : instruction is decoded and the register file is accessed to get the values from the register used int the instruction. 4. **EX**(Instruction Execute) : use ALU to execute calculations or compute memory addresses 5. **MEM**(Memory Access) : memory operands are read and written from/to the memory that is present in the instruction. 6. **WB**(Wirte Back) : the value which is Computed or Fetched will be written back to the register present in the instructions. #### Instruction Pipeline We will use the `addi x10 x0 17 ` instruction to demonstrate how the pipeline works. ##### IF stage ![](https://hackmd.io/_uploads/S1fHr-bWa.png) * Transfer the value of the PC register to the instruction memory for instruction fetching. * Determine whether to load a new PC value based on stall and flush control signals. ##### ID stage ![](https://hackmd.io/_uploads/Hy9iHbb-a.png) * After decode, identify that the opcode is an i-type instruction, and pass the values of 17 and the x0 register to ID/EX register. ##### EX stage ![](https://hackmd.io/_uploads/BJrscWbWT.png) * The ALU adds the values of 17 and x0 and sends the result to EX/MEM register. ##### MEM stage ![](https://hackmd.io/_uploads/ByTY3-Z-T.png) * There is no need for memory access, the value calculated by the ALU is directly passed to the MEM/WB register. ##### WB stage ![](https://hackmd.io/_uploads/Byl32-bZa.png) * Write the calculated value back to the registers. ### Cache Performance #### Data Cache ![](https://hackmd.io/_uploads/H1nI2-lba.png) #### Instruction Cache ![](https://hackmd.io/_uploads/Byzr3Zx-a.png) ### Assembly Optimization #### Using loop unrolling ```code= # t2 = reversed t3 li t2, 0 li t5, 0 li t0, 32 reverse_loop: srl t6, t3, t5 andi t6, t6, 1 slli t2, t2, 1 or t2, t2, t6 addi t5, t5, 1 # loop+1 blt t5, t0, reverse_loop ``` before using loop unrolling performance is like below. ![](https://hackmd.io/_uploads/BkEkobxWp.png) ```code= # Use loop unrolling # t2 = reversed t3 li t2, 0 li t5, 0 li t0, 16 reverse_loop: srl t6, t3, t5 andi t6, t6, 1 slli t2, t2, 1 or t2, t2, t6 srl t6, t3, t5 andi t6, t6, 1 slli t2, t2, 1 or t2, t2, t6 addi t5, t5, 1 # loop+1 blt t5, t0, reverse_loop ``` after using loop unrolling performance is like below. ![](https://hackmd.io/_uploads/rJOqZXxZa.png)