owned this note
owned this note
Published
Linked with GitHub
# Lab1:RV32I simulator
###### tags: `Computer Architecture 2021`
## Environment
* [Ripes](https://github.com/mortbopet/Ripes) : is a graphical 5-stage RISC-V pipeline simulator & assembly editor.
## Number of 1 Bits ([191. Number of 1 Bits](https://leetcode.com/problems/number-of-1-bits/))
- **Problem Definition**:Write a function that takes an unsigned integer and returns the number of '1' bits it has (also known as the Hamming weight).
- The C code implementation is as follow.
```c
int number_of_1bits(uint32_t n){
int count = 0;
while (n != 0){
n = n & (n - 1);
count++;
}
return count;
}
```
## Number of 1 Bits Assembly
```c
.data
argument: .word 0b00000000000000000000000000001011
str1: .string "Number of "
str2: .string " 1 bits is "
str3: .string "'s"
.text
main:
mv a0,x0 # count = 0
lw a1,argument # n = 00000000000000000000000000001011
jal ra,func
# print the result to console
mv a1,a0
lw a0,argument
jal ra,printResult
# Exit program
li a7,10
ecall
func:
mv t0,a0 # t0=count
mv t1,a1 # t1=argument
_beq:
bne t1,x0,cnt # if(n!=0) goto cnt
_ret:
mv a0,t0 # a0=count
ret
cnt:
addi t2,t1,-1 # t2 = argument-1
and t1,t1,t2 # n = n&(n-1)
addi t0,t0,1 # count++
j _beq
printResult:
mv t0,a0 # t0:input value
mv t1,a1 # t1:count
la a0,str1
li a7,4
ecall
mv a0,t0
li a7,1
ecall
la a0,str3
li a7,4
ecall
la a0,str2
li a7,4
ecall
mv a0,t1
li a7,1
ecall
ret
```
### Result
we can check result in console

### How to use ?
You can change the value you want on the argument.
This argument value can be binary or decimal value.

:::warning
TODO: Check [this note](https://hackmd.io/@WeiCheng14159/BkiRTKdUP) for further optimization and then re-implement with RISC-V
:notes: jserv
:::
### Assembly Explained
#### main idea
To compute the 1’s of a number, I apply the idea that the result of a & (a - 1) will always be the value of a without the least significant 1.
For example:
let's try all the 4-bit combinations:
| n | n(binary) | n-1 |n&(n-1)|
| -------- | -------- | -------- | -------- |
|0 | 0000|0111|0000|
|1 | 0001|0000|0000|
|2 | 0010|0001|0000|
|3 | 0011|0010|0010|
|4 | 0100|0011|0000|
|5 | 0101|0100|0100|
|6 | 0110|0101|0100|
|7 | 0111|0110|0110|
|8 | 1000|0111|0000|
|9 | 1001|1000|1000|
|10| 1010|1001|1000|
|11| 1011|1010|1010|
|12| 1100|1011|1000|
|13| 1101|1100|1100|
|14| 1110|1101|1100|
|15| 1111|1110|1110|
After the operation, if n>0, recalculate again until n==0.This will calculate the number of 1 bits.
There're 5 sections this program namely `main`, `func`, `_beq`, `_ret`,`cnt`,`printResult`
* `main`
* Initialize `a0` argument register to 0, `a0` represents the `count` variable.
* Load input value to the `a1` argument register.
* `func`
* move `a0` argument register to `t0` temporary register.
* move `a1` argument register to `t1` temporary register.
* `_beq`
* Compare `t1` temporary register with 0, if not equal, jump to cnt.
* `_ret`
* return value to main.
* `cnt`
* The `t2` temporary register stores the result of subtracting 1 from the `t1` temporary register.
* The `t1` temporary register stores the bit-wise AND result of `t1` and `t2`.
* Add 1 to the `t0` temporary register representing the `count` variable.
* `printResult`
* How to print out the result ? The `ecall` instruction can do this. The `ecall` instuction represented system call in RISC-V.
* We can check the system call by looking up the system service table.
For example: the system call corresponding to print string is 4
```asm=
la a0,str1 # str1: "Number of "
li a7,4
ecall
mv a0,t0 # t0:integer
li a7,1
ecall
```

## Analyze
### RISC-V Pipline
**Ripes** is a 5-stages pipeline CPU

* **IF**
* In this stage the CPU reads instructions from the address in the memory whose value is present in the program counter
* **ID**
* In this stage, instruction is decoded and the register file is accessed to get the values from the registers used in the instruction.
* **EX**
* In this stage, ALU operations are performed.
* **MEM**
* In this stage, memory operands are read and written from/to the memory that is present in the instruction.
* **WB**
* In this stage, computed/fetched value is written back to the register present in the instructions.
### Pipeline Hazard
* **Structure Hazard**
* Different instructions want to use the same resource at the same time
* **Data Hazard**
* An instruction in the pipeline needs to use a result that has not been produced by the instruction in the previous stage
* **Control Hazard**
* When the Branch instruction has not yet decided whether to branch, but the subsequent instructions have entered the pipeline, the execution sequence is wrong.
#### How to handle hazard
```asm=
mv a0,x0 # count = 0
lw a1,argument # n = 00000000000000000000000000001011
jal ra,func
# print the result to console
mv a1,a0
lw a0,argument
```
* When the `jal` instruction is in the exe stage, there are instructions in the IF and ID stages, but they are not in the correct order, because `jal` has to jump to the `func` label

* When the `jal` instruction is in the MEM stage, the correct execution address will be sent to `pc` counter and the ID and EXE stage instructions will be **flushed**.

* when the `jal` instruction is executed,the **address** of the next command of `jal` will be stored in the `ra` register.

```asm=
_beq:
bne t1,x0,cnt # if(n!=0) goto cnt
_ret:
mv a0,t0 # a1=count
ret
```
* When the `bne` instruction is in the exe stage, it has not been decided whether to jump to the `cnt` label.
* There are instructions in the IF and ID stages, but they are not in the correct order.
* This is **Control Hazard**.

* When the `bne` instruction is in the MEM stage, it is known to jump to the `cnt` label, so the ID and EXE stage instructions will be **flushed**.
* So the instructions in the IF phase will be correct.

```asm=
addi t2,t1,-1 # t2 = argument-1
and t1,t1,t2 # n = n&(n-1)
```
* There is a **data hazard** between `addi` and `and`, because in the `addi` instruction, the `t2` register will be written back in the WB stage, and in the `and` instruction, the `t2` register will be used in the EXE, which will cause The value of `t2` register in the `and` instruction is incorrect.
* So after `addi` enters the mem stage, it will also forward the coreect value of `t2` register to the exe stage.
* So the value of `t2` register in the `and` instruction is correct.

## Number of 1 Bits (Version 2)
- The C code implementation is as follow.
```c=
// __popcount(x) is a function returns the number of 1-bits set in an int x
int __popcount(unsigned int n){
uint32_t mask[] = {0x55555555,0x33333333,0x0F0F0F0F,0x00FF00FF,0x0000FFFF};
for(int i = 0,j = 1;i < 5; i++,j <<= 1){
n = (n & mask[i]) + ((n >> j) & mask[i]);
}
return n;
}
```
## Number of 1 Bits Assembly (Version 2)
```c=
.data
mask: .word 0x55555555,0x33333333,0x0F0F0F0F,0x00FF00FF,0x0000FFFF
input: .word 0x000000053
str1: .string "Number of "
str2: .string " 1 bits is "
str3: .string "'s"
.text
lw s1,input # s1 = input value
la s2,mask # s2 = mask
add a0,s1,x0 # function argument
add a1,s2,x0 # function argument
jal ra,popcount # jump to popcount func
# print the result to console
mv a1,a0
lw a0,input
jal ra,printResult
# Exit program
li a7,10
ecall
popcount:
mv t0,a0 # t0 = input value
addi t1,x0,1 # j = 1
add t2,x0,x0 # i = 0
_beq:
sltiu t3,t2,5 # if(i<5),t3 = 1
beqz t3,_return # if(t3 == 0) goto _return
slli t3,t2,2 # convert to offset of mask
add t4,t3,a1 # address of current index
lw t3,0(t4) # t3 = mask[i]
and t4,t0,t3 # t4 = input value & mask[i]
srl t0,t0,t1 # input = input >> j
and t0,t0,t3 # input = input & mask[i]
add t0,t0,t4 # n = (n & mask[i]) + ((n >> j) & mask[i])
slli t1,t1,1 # j <<= 1
addi t2,t2,1 # i++
j _beq
_return:
mv a0,t0 # return value
ret
printResult:
mv t0,a0 # t0:input value
mv t1,a1 # t1:result
la a0,str1
li a7,4
ecall
mv a0,t0
li a7,1
ecall
la a0,str3
li a7,4
ecall
la a0,str2
li a7,4
ecall
mv a0,t1
li a7,1
ecall
ret
```
## Reference
* [191. Number of 1 Bits](https://leetcode.com/problems/number-of-1-bits/)
* [geeksforgeeks](https://www.geeksforgeeks.org/computer-organization-and-architecture-pipelining-set-1-execution-stages-and-throughput/)
* [note](https://hackmd.io/@WeiCheng14159/BkiRTKdUP)