Adapt/Rewrite a non-trivial application in RISC-V assembly, running on Ripes

# Adapt/Rewrite a non-trival application in RISC-V assembly, running on Ripes > contributed by [weiweiwei68](https://github.com/weiweiwei68/A-non-trivial-application-in-RISC-V-assembly) ## Image processing Image processing is a field that involves the manipulation and analysis of images to extract useful information or enhance visual features. This technique plays a crucial role in various apllication, such as computer vision, CNN (Convolution Neural Network), satellite imaging, and more. The following function conducts convolution on a specified image size, requiring two arguments. The first argument is the image to be modified, and the second one is a kernel matrix. ```clike= void convolution(int image[IMAGE_SIZE][IMAGE_SIZE], int kernel[KERNEL_SIZE][KERNEL_SIZE]) { int result[IMAGE_SIZE - KERNEL_SIZE + 1][IMAGE_SIZE - KERNEL_SIZE + 1]; // Perform convolution for (int i = 0; i <= IMAGE_SIZE - KERNEL_SIZE; i++) { for (int j = 0; j <= IMAGE_SIZE - KERNEL_SIZE; j++) { int sum = 0; for (int k = 0; k < KERNEL_SIZE; k++) { for (int l = 0; l < KERNEL_SIZE; l++) { sum += image[i + k][j + l] * kernel[k][l]; } } result[i][j] = sum; } } ``` Now, let's assume that the pixel matrices of the original image and kernel are represented by the following code. ```c // Sample 9x9 image int image[IMAGE_SIZE][IMAGE_SIZE] = { {0, 3, 9, 6, 4, 7, 8, 1, 2}, {7, 8, 5, 1, 3, 6, 6, 3, 0}, {8, 5, 6, 9, 1, 0, 5, 2, 4}, {9, 5, 1, 6, 4, 0, 7, 1, 9}, {5, 7, 3, 2, 1, 6, 1, 0, 1}, {7, 5, 8, 4, 0, 3, 5, 2, 1}, {5, 1, 8, 6, 4, 6, 8, 5, 9}, {0, 1, 5, 7, 4, 3, 4, 8, 9}, {1, 0, 5, 2, 3, 6, 9, 1, 7}, }; // Sample 3x3 kernel int kernel[KERNEL_SIZE][KERNEL_SIZE] = { {-1, -1, -1}, {-1, 8, -1}, {-1, -1, -1}, }; ``` Output after convolution > 21 -7 -35 -10 14 16 -4 -9 8 45 -21 -32 15 -19 -4 -35 21 7 -25 41 -21 13 -14 -11 -17 27 -16 -27 -4 28 0 -32 -7 9 -14 -31 27 8 -1 17 28 -6 -17 10 19 -5 -20 -14 12 The identical execution results between the C code and the RISC-V assembly demonstrate that the RISC-V assembly is indeed capable of effectively handling the convolutional task. ## Ripes [Ripes](https://github.com/mortbopet/Ripes) is a graphical RISC-V pipeline simulator and assembly editor. It allows users to visualize the process during the execution of RISC-V code, enhancing the understanding and analysis of the program's behavior. ### Performance The following picture shows the information about the exection of the assembly code. ![image](https://hackmd.io/_uploads/HJzrU09Op.png) ## Pipeline stage ### 5-stage pipelined processor ![image](https://hackmd.io/_uploads/r1Mx-w5O6.png) The above image demonstrates an emulated RISC-V processor featuring a 5-stage pipeline for operating instructions. A 5-stage pipelined processor is a kind of microprocessor architecture that divides the operation of instructions into five stages. Each stage is responsible for a specific operation, and multiple instructions can be in various stages of execution simultaneously. The primary purpose of pipelining is to increase overall instruction throughput and improve performance. The introduction and experimentation with the 5-stage pipeline will be presented. ### Instruction Fetch (IF) In this stage, the processor fetches the instruction from memory based on the program counter (PC). The PC is incremented to point to the next instruction in the sequence, and this increment is set to 4. This value reflects the fact that one instruction occupies one word, and each word is equivalent to 4 bytes. In the following picture, we can observe that the PC value equals to ```0x00000008```, and the increment of 4 will be applied to the PC value once the current instruction is fetched. ![image](https://hackmd.io/_uploads/HJ5CED9_p.png) ### Instruction Decode (ID) The fetched instruction is decoded in this stage to determine the type of operation to be performed and the operands involved. Additionally, register values may be read from the register file during this stage. Furthermore, control signals are generated to direct subsequent stages of the pipelined. In the following picture, the instruction ```lw x8, 0(x8)``` is decoded after fetched from memory. Notably, since ```lw``` type instruction necessitates a register write, the write enable signal will set to 1 in the later stage. ![image](https://hackmd.io/_uploads/HJ7ktw5up.png) ### Execution (EX) During this stage, the actual execution of the arithmetic or logic operation specified by the instruction takes place. Here, the ALU (Arithmetic Logic Unit) operations are performed, and the result is computed. In the following picture, the memory address is computed by the ALU component. By adding the base address ```x8``` and the offset ```0```, we can determine the target address where the desired data will be loaded in the subsequent stage. ![image](https://hackmd.io/_uploads/rylPOH_cOa.png) ### Memory Access (MEM) If the instruction involves accessing memory (e.g., load or store operations), this stage is responsible for those memory transactions. For load operations, the data is read from memory, while for store operations, data is written to memory. In the following picture, the data in memory is accessed using the specific target address computed in the preceding stage. As the load instruction is executed, the control signal for writing is set to 0, marked in red within the picture. ![image](https://hackmd.io/_uploads/ryiRc_q_a.png) ### Write Back (WB) The final stage involves writing the results of the executed instruction back to the register file or other relevant registers. This stage completes the instruction exection cycle, marking the result available for subsequent instructions. In the following pictures, the Write Back (WB) stage undertakes the task of inscribing the result value into the register file. Notably, the control signal for writing in the register file is set to 1, distinctly marked in green, aligning with the context mentioned earlier. ![image](https://hackmd.io/_uploads/rk9j6dcOp.png) ![image](https://hackmd.io/_uploads/BJGQ1F5ua.png) ### Pipeline Process ![image](https://hackmd.io/_uploads/r1GP42qua.png) ![image](https://hackmd.io/_uploads/HyOFN2cda.png) Pipelining is a technique in microprocessor design that aims to increase instruction throughput by breaking down the execution of instructions into multiple stages, allowing for the simultaneous processing of different instructions at various stages of execution. Each stage of the pipeline can be working on a different instruction, which leads to improved overall processor efficiency. In the case of a 5-stage pipelined processor, the processor can potentially handle up to 5 instructions simultaneously, with each instruction at a different stage of the pipeline. This overlapping of instruction execution helps in maximizing the utilization of various functional units within the processor and, consequently, increases throughput. However, it's important to note that while pipelining can enhance performance, it also introduces challenges such as data hazards, control hazards, and pipeline stalls, which need to be managed to ensure correct execution. Techniques like forwarding, branch prediction, and out-of-order execution are often employed to address these challenges and optimize the effectiveness of pipelined processors. ### Hazard #### Data hazard A data hazard in computer architecture occurs when there is a dependency between instructions regarding the data they operate on. It happens when an instruction depends on the result of a previous instruction. If the dependent instruction tries to read the data before the producing instruction writes it, a hazard occurs. Data hazards can result in pipeline stalls or incorrect program behavior, so they need to be managed carefully. The following image demonstrates an occurence of a hazard when the instruction ```addi x8, x8, -3``` follows the instruction ```lw x8, 0(x8)```. ![image](https://hackmd.io/_uploads/BkyG965_T.png) #### Control hazard A control hazard, also known as a branch hazard, occurs in computer architecture when the outcome of a branch instruction (e.g., a conditional branch) is not known at the time subsequent instructions are fetched. This uncertainty about the branch outcome creates a hazard because the pipeline needs to stall or be flushed until the correct branch path is determined. Techniques like branch prediction are used to mitigate the impact of control hazards and maintain efficient pipeline operation. The provided image illustrates a misprediction of an instruction, and as a result, incorrect instructions must be flushed to prevent the occurrence of errors. ![image](https://hackmd.io/_uploads/ryCR0aqda.png) ## Reference * [Wiki-image processing](https://en.wikipedia.org/wiki/Kernel_(image_processing)) * [Ripes](https://github.com/mortbopet/Ripes)