# Assignment1: RISC-V Assembly and Instruction Pipeline
contributed by < [`GliAmanti`](https://github.com/GliAmanti) >
###### tags: `RISC-V`, `Jserv`
## Subject
### Implementation of Bit-Plane Slicing with CLZ
As we know, image is mainly grouping of pixel information. For an **8-bit grayscale** image, a pixel value of **0** is encoded as **00000000** in binary form and **255** is encoded as **11111111**. Here, the leftmost bit is known as the most significant bit (**MSB**) as it contributes the maximum. Similarly, rightmost bit is known as least significant bit (**LSB**).
In **Bit-plane slicing**, we divide image into 8 planes. **Plane 1** contains all **LSBs** in the bytes comprising the pixels in the image and **Plane 8** contains all **MSBs**. It's a useful way for image enhancement, since human eye is sensitive to digital with high frequency. And we only highlight the contribution made to total image appearance by specific high level bits.



## Solution
### My Thinking Path
In most implementation, the user input image will be divided into 8 arrays, having the same size as input. And each array contains the corresponding bit value. By doing so, both time and space complexity will be quite large, since we have to check every bit and store them in different arrays.
As a result, I simplify this question to **"showing the MSBs plane only"**, since there is more detail information in it.
### Function Explanation
There are 2 steps in my program.
First, I traverse the input image (array) to get the MSB of every pixel, recording the maximal MSB at the same time. At this step, **CLZ** function provides a fast way to get the MSB.
Second, I reconstruct the image by comparing the maximal MSB and the MSB of every pixel. If they are equal, the value **255 (white)** will be assigned to the pixel, which makes the output image more clear. Otherwise, the value **0 (black)** will be assigned to the rest of pixels.
If the program works, the output image will be like the image as shown above.
## Implementaion
### C Code
Here is my source code in [GitHub](https://github.com/GliAmanti/ComputerArchitecture_HW1/blob/main/hw1_v2.cpp).
<!--```C=
```-->
### RISC-V Assembly Code
Here is my source code in [GitHub](https://github.com/GliAmanti/ComputerArchitecture_HW1/blob/main/hw1_v2.s).
## Result
* ### test case 0
* `input[] = {255, 0, 128, 1}`
* `size = 4`
* `caseNum = 0`
* **`output = 255 0 255 0`**
* ### test case 1
* `input[] = {167, 133, 111, 144, 140, 135, 159, 154, 148}`
* `size = 9`
* `caseNum = 1`
* **`output = 255 255 0 255 255 255 255 255 255`**
* ### test case 2
* `input[] = {50, 100, 150, 200, 250, 255}`
* `size = 6`
* `caseNum = 2`
* **`output = 0 0 255 255 255 255`**
### C Code Output

### RISC-V Assembly Code Output

### Memory
The base addresses of test case **`0`**, **`1`** and **`2`** are **`0x10000000`**, **`0x10000010`** and **`0x10000034`**. I transform the original input arrays into MSBs arrays. The following words in the screenshot show the MSB value of every pixel. (The LSB starts from 0.) According to the MSBs arrays, we can decide which color will be assigned.

## Analysis
I test my code using [Ripes](https://github.com/mortbopet/Ripes) simulator.
### Pseudo instruction
Here is my source code in [GitHub](https://github.com/GliAmanti/ComputerArchitecture_HW1/blob/main/hw1_v2_disassemble.s).
### Pipeline Stage Explanation
Ripes provides different processors to run the code. And I choose **5-stage processor** to run my program.

I take the instruction **`lw s4, 0(t2)`** for example and analyze how the processor operates the instruction in different stages.
According to [RISC-V Manual (p.18-19)](https://riscv.org//wp-content/uploads/2017/05/riscv-spec-v2.2.pdf):
> Load and store instructions transfer a value between the registers and memory. Loads are encoded in the **I-type format** and stores are S-type. The effective byte address is obtained by adding register *rs1* to the sign-extended 12-bit offset. Loads copy a value from memory to register *rd*. Stores copy the value in register *rs2* to memory.

We can also get the function code and opcode from [RISC-V Manual (p.104)](https://riscv.org//wp-content/uploads/2017/05/riscv-spec-v2.2.pdf)
>
Let’s check what the pipeline is like in every stage.
* #### IF: Instruction Fetch
:::success
Fetch the instruction from Instruction Memory.
:::

* The address of instruction **`lw s4, 0(t2)`** is `0x00000054`, so the PC outputs `0x00000054`, which is the input of the Instruction Memory.
* The output of the Instruction Memory is the corresponding instruction `0x0003aa03`, which is a machine code combining the function code and opcode.
* Because there is no jump or branch instruction. The PC (`0x00000054`) will be added by 4 (`0x00000058`) for the next instruction.
* #### ID: Instruction Decode
:::success
Decode the instruction and read the values from Register File.
:::

* The instruction `0x0003aa03` will be decoded into the following parts.
| imm[11:0] | rs1 | funct3 | rd | opcode |
|:------------:|:----:|:------:|:--------:|:--------:|
| 000000000000 | 00111 (x7) | 010 | 10100 (x20) | 0000011 |
So we can get the following information.
* `opcode` = `LW`
* `R1 idx` = `0x07` (x7)
* `Wr idx` = `0x14` (x20)
* `imm.` = `0x000`
* `R2 idx` = `0x00` (**unused**)
* `R1 idx` will be sent to Register File to get the value of register x7.
* `Reg1` = `0x00000000`
* `R2 idx` will be sent to Register File to get the value of register x0. But it's **useless** in LW instruction.
* `Reg2` = `0x00000000`
* The immediate value will be sent to Immediate Value Decoder and be extended to a 32-bit number for later calculation.
* `imm.` = `0x00000000` (`0x000` in upper 12 bits, filling in the lowest 20 bits with zeros)
* #### EXE: Execute
:::success
Execute calculation or compute memory address using ALU.
:::

* First level multiplexers choose value coming from `EXE stage forwarding` and `Reg 2`.
::: danger
Data hazard occurs when **`lw s4, 0(t2)`** follows behind **`slli t2, s1, 2`**. So the pipeline takes the value from EXE stage forwarding.
:::
* Second level multiplexers choose value coming from first level multiplexer (the upper one) and `imm.`.
* The output of second level multiplexers are also send to branch block, but branch is not taken.
* ALU adds two operands together, and the output will be sent to the next stage.
* `op1` = `0x10000000`
* `op2` = `0x00000000`
* `Res` = `0x10000000`
* `Wr idx` passes through this stage and go to EXE stage.
* #### MEM: Memory Access
:::success
Read/write the operands from/to the Data Memory.
:::

* `Res` will be sent to Data Memory to get the value of address `0x10000000`.

* `Read out` = `0x000000ff`
* `Reg 2` will be send to `Data in`, but Data Memory doesn’t enable writing.
* `Wr idx` passes through this stage and go to WB stage.
* #### WB: Write Back
:::success
Write the result back to the Register File.
:::

* The multiplexer chooses `Read out` from Data Memory as final output.
* `out` = `0x000000ff`
* The `out` and `Wr idx` are send back to Register File. Finally, the value `0x000000ff` will be written into register x20, whose ABI name is s4.
* After all these stages are done, the register is updated like this:

<!-- ### Pipeline Hazard
* ### xxx hazard
* #### Solution:
* ### xxx hazard
* #### Solution:
* ### xxx hazard
* #### Solution: -->