# Assignment1: RISC-V Assembly and Instruction Pipeline
contributed by < [`rayChen`](https://github.com/padaray)>
## Implement palindrome detection and using CLZ
### 1. Using Count leading Zeros
First, calculate leading zeros, and then subtract it from the bit count of the datatype to determine the actual number of bits used.
### 2. Palindrome detection
To detect a palindrome, we need to know the actual number of bits used by the variable and process these bits as follows:
- 1. Passing the actual stored bit count as one of the parameters to `palindrome_detected()`function.
- 2. Reverse the bits based on the actual stored bits.
- 3. Compare the reversed parameter with the input parameter
### 3. Implement Palindrome detection with C
```code=
#include <stdint.h>
#include <iostream>
#include <cstdio>
// count how many zeros forwards input number
uint16_t count_leading_zeros(uint64_t x)
{
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
x |= (x >> 32);
/* count ones (population count) */
x -= ((x >> 1) & 0x5555555555555555 );
x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333);
x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f;
x += (x >> 8);
x += (x >> 16);
x += (x >> 32);
return (64 - (x & 0x7f));
}
bool palindrome_detected(uint64_t x, int clz){
/* tempX = left half of input x (use to reverse and check) */
uint64_t nob = (64 - clz);
uint64_t checkEven = nob % 2;
uint64_t tempX = (x >> (nob / 2));
tempX = (tempX >> checkEven);
printf("tempX1 = %llx\n", tempX);
/* tempY = right half of input x */
uint64_t leftShiftNum = (nob / 2) + checkEven + (64 - nob);
printf("leftShiftNum = %d\n", leftShiftNum);
uint64_t tempY = (x << leftShiftNum);
tempY = (tempY >> leftShiftNum);
/* reverse tempX */
uint64_t revTempX = 0x0;
uint64_t maskTempX = tempX;
for (int i = 0; i < 64; i++)
{
revTempX = (revTempX << 1);
revTempX |= maskTempX & 0x1;
maskTempX = (maskTempX >> 1);
}
revTempX = (revTempX >> leftShiftNum);
printf("revTempX = %llx, tempY = %llx\n", revTempX, tempY);
/* check revTempX nd tempY same or not */
if (revTempX == tempY) {
return true;
} else {
return false;
}
}
int main(){
uint64_t testA = 0x0000000000000000; //0 is palindrome
uint64_t testB = 0x0000000000000001; //testB not palindrome
uint64_t testC = 0x00000C0000000003; //testC is palindrome
uint64_t testD = 0x0F000000000000F0; //testD not palindrome
printf("%d\n", palindrome_detected(testA, count_leading_zeros(testA)));
printf("%d\n", palindrome_detected(testB, count_leading_zeros(testB)));
printf("%d\n", palindrome_detected(testC, count_leading_zeros(testC)));
printf("%d\n", palindrome_detected(testD, count_leading_zeros(testD)));
}
```
### 4. Assembly Code
Assembly Code [`Hyperlink`](https://github.com/padaray/computer_architecture_hw1/blob/main/HW1.s)
### 5. Analysis
Use [Ripes](https://github.com/mortbopet/Ripes) to simulate RISC-V processor
#### Ripes Simulator
Ripes Simulator is an open-source, web-based RISC-V computer architecture simulator designed for educational and research purposes. It utilizes a **5-stage pipeline** architecture.

#### 5-stage pipeline
1. **IF**(Instruction Fetch) : CPU reads instruction pipeline from the address in the memory whose value is present in the program counter(PC)
3. **ID**(Instruction Decode) : instruction is decoded and the register file is accessed to get the values from the register used int the instruction.
4. **EX**(Instruction Execute) : use ALU to execute calculations or compute memory addresses
5. **MEM**(Memory Access) : memory operands are read and written from/to the memory that is present in the instruction.
6. **WB**(Wirte Back) : the value which is Computed or Fetched will be written back to the register present in the instructions.
#### Instruction Pipeline
We will use the `addi x10 x0 17 ` instruction to demonstrate how the pipeline works.
##### IF stage

* Transfer the value of the PC register to the instruction memory for instruction fetching.
* Determine whether to load a new PC value based on stall and flush control signals.
##### ID stage

* After decode, identify that the opcode is an i-type instruction, and pass the values of 17 and the x0 register to ID/EX register.
##### EX stage

* The ALU adds the values of 17 and x0 and sends the result to EX/MEM register.
##### MEM stage

* There is no need for memory access, the value calculated by the ALU is directly passed to the MEM/WB register.
##### WB stage

* Write the calculated value back to the registers.
### Cache Performance
#### Data Cache

#### Instruction Cache

### Assembly Optimization
#### Using loop unrolling
```code=
# t2 = reversed t3
li t2, 0
li t5, 0
li t0, 32
reverse_loop:
srl t6, t3, t5
andi t6, t6, 1
slli t2, t2, 1
or t2, t2, t6
addi t5, t5, 1 # loop+1
blt t5, t0, reverse_loop
```
before using loop unrolling performance is like below.

```code=
# Use loop unrolling
# t2 = reversed t3
li t2, 0
li t5, 0
li t0, 16
reverse_loop:
srl t6, t3, t5
andi t6, t6, 1
slli t2, t2, 1
or t2, t2, t6
srl t6, t3, t5
andi t6, t6, 1
slli t2, t2, 1
or t2, t2, t6
addi t5, t5, 1 # loop+1
blt t5, t0, reverse_loop
```
after using loop unrolling performance is like below.
