# Assignment1: RISC-V Assembly and Instruction Pipeline
contributed by < [`SUE3k`](https://github.com/SUE3K/computer_architecture_hw1) >
## Find the position of MSB by CLZ
We can find the position of MSB (Most Significant Bit) using CLZ (Count Leading Zeros) due to the following reasons:
1. CLZ provides a straightforward way to determine the MSB without the need for complex bitwise shifting and comparison operations. It simplifies the process of finding the most significant bit in a number.
2. CLZ efficiently finds the leftmost set bit, making it valuable for various applications across different platforms.
### Motivation
When I learned that we were going to apply the Count Leading Zeros (CLZ) operation, my initial idea was that it had a connection with finding the Most Significant Bit (MSB). When we analyzing a binary digit, determining it MSB is not always straightforward. Instead, we often need to count the number of zeros preceding it one by one. It's quite non-intuitive and complex. Therefore, we can intuitively calculate CLZ first and then use that information to find the position of the MSB in a number. So we can immediately know what bit is the MSB.
### C program
```c
#include <stdio.h>
#include <stdint.h>
uint16_t count_exponent(uint64_t x)
{
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
x |= (x >> 32);
/* 計算二進制表示中1的個數,以找到前導零數 */
x -= ((x >> 1) & 0x5555555555555555);
x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333);
x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f;
x += (x >> 8);
x += (x >> 16);
x += (x >> 32);
// 直接返回計算的前導零數
return (32 - (x & 0x7f));
}
int main()
{
uint64_t test_data[] = {0x00000011, 0x00001101, 0x00010011};
for (int i = 0; i < sizeof(test_data) / sizeof(test_data[0]); i++)
{
uint32_t clz = count_exponent(test_data[i]);
if (clz < 32)
{
uint32_t msb = (uint32_t)(31 - clz);
printf("Test Data %d:\n", i + 1);
printf("Input: 0x%016llX\n", test_data[i]);
printf("Leading Zeros: %u\n", clz);
printf("MSB: %u\n", msb);
}
else
{
printf("Test Data %d: Invalid input, cannot calculate MSB.\n", i + 1);
}
printf("\n");
}
return 0;
}
```
### Assembly
version2 include loops (or recursive calls) and conditional branches.
```c
.data
test1: .word 17
test2: .word 4353
test3: .word 65553
newline: .string "\n"
printf1: .string "leading zeros:"
printf2: .string "MSB:"
.text
counter:
li s7, 0 # intial counter
li s9, 1
li s8, 0
jal loop
loop:
beq s7, x0, load_test1 #s7:0->1
beq s7, s9, load_test2 #s7:1->2
beq s8, s9, exit #avoid
bge s7, s9, load_test3 #s7:2->3
main:
jal counting_process
li t4, 31
sub s6, t4, s5
addi s7, s7, 1
jal result
load_test1:
lw s0, test1
jal ra, main
load_test2:
lw s0, test2
jal ra, main
load_test3:
lw s0, test3
addi s8, s8, 1
jal ra, main
result:
la a0, printf1
li a7, 4
ecall
mv a0, s5
li a7, 1 #print n
ecall
la a0, newline
li a7, 4
ecall
la a0, printf2
li a7, 4
ecall
mv a0, s6
li a7, 1
ecall
la a0, newline
li a7, 4
ecall
jal loop
ret
exit:
li a7, 10
ecall
counting_process:
#x |= (x >> 1);
#x |= (x >> 2);
#x |= (x >> 4);
#x |= (x >> 8);
#x |= (x >> 16);
#x |= (x >> 32);
srli s1, s0, 1
or s0, s1, s0
srli s1, s0, 2
or s0, s1, s0
srli s1, s0, 4
or s0, s1, s0
srli s1, s0, 8
or s0, s1, s0
srli s1, s0, 16
or s0, s1, s0
#x -= ((x >> 1) & 0x5555555555555555);
srli s1, s0, 1 #x>>1
li s2, 0x55555555
and s1, s1, s2 #x>>1 & 0x55..
sub s0, s0, s1
#x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333);
srli s1, s0, 2
li s2, 0x33333333
and s1, s1, s2
and s3, s0, s2
add s0, s1, s3
#x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f;
srli s1, s0, 4
add s1, s0, s1
li s2, 0x0f0f0f0f
and s0, s1, s2
#x += (x >> 8);
srli s1, s0, 8
add s0, s0, s1
#x += (x >> 16);
srli s1, s0, 16
add s0, s0, s1
#x += (x >> 32);
li s1, 0 #srli 32bit = 0
add s0, s0, s1
li t1, 0x0000007f
and s1, s0, t1
sub s5, x0, s1 #n=s5
addi s5, s5, 32
ret
```
There are 3 test data in this program:
* test1: .word 17
* test2: .word 4353
* test3: .word 65553
Correct answer

# Analysis & Pipeline
Generated from Ripes
The 5-stage in-order processor with hazard detection/elimination and forwarding gets correct results on each of test data:

```
counter:
li s7, 0 # intial counter
```
## IF stage

* That observe the `addi x23 x0 0` instruction which is at address `0x00000000`
* Program Counter is `0x00000004`, which refers to the next instruction address.
* If without any branching occured ,`PC` is supposed to be `IR`+4
* And input of Instruction memory is `0x00000000`.
* Compressed decoder output the hex instruction `0x00000b93`
## ID stage

* The `addi x23 x0 0` is the I-type instruction.
>
| IMM | RD | OP |
| -------- | -------- | -------- |
| 00000000000000000000 | 10111 | 0010011 |
* Processor can get OP code `0010011` from Decode and decoding the `addi` instruction.
* The desstination register is `x23`.
## EXE stage

* Here we use OP1 and OP2 to implement `addi x23 x0 0`
## MEM stage

* This is an instruction that loads an immediate value doesn't involve memory access.
## WB stage

* At this stage, the result is written back into register x23, i.e. zero is written into register x23.
* The pseudo-instruction "li s1, 0" will actually be converted into a basic instruction. This basic instruction will be executed in the pipeline, and each stage will be processed once. The pseudo-instruction here is converted into an instruction to load an immediate value, so it will take multiple clock cycles in the pipeline to complete.
## CPU analysis
without loops

include loops
