Assignment1: RISC-V Assembly and Instruction Pipeline

# Implement Binarization by count leading zero contributed by [edenlin](https://github.com/JinYu1225/2023_Computer_Architecture) The Problem I chose to derive is the Problem A in [quiz1 - Count Leading Zero](https://hackmd.io/@sysprog/arch2023-quiz1#Problem-A). Binarization is widely used in image processing. In grayscale, we divide the depth of the color from 255 to 0. 255 means totally white while 0 means totally black. Binarization will convert each pixel to either 0 or 255 by a "threshold". Threshold is a parameter determines the value that pixel should convert to. When the color depth is greater than than the threshold, pixel converts to 255 (White). In contrast, if the color depth is smaller, pixel converts to 0 (Black). | Threshold = 128 | Before | After | | --- | --- | --- | | Color Depth 1 | 20 | 0 | | Color Depth 2 | 200 | 255 | In this homework, pixel converts to 0 (Black) when the depth is equal to the threshold. The leading zeros will be a way to determine whether the depth is greater or smaller than the threshold. ## Implemention Please check the source code in my [github](https://github.com/JinYu1225/2023_Computer_Architecture). ### C code In the following C code gives an example for a binarization of a 5-pixel data. The threshold is set to `231` and can be changed from `0` to `255`. The value of each color depth of pixel should be limit from `0` to `255`. The `0x7f` in the CLZ part has been changed to `0x3f` for 32-bit progression. ```clike= #include <stdint.h> #include <stdio.h> uint32_t count_leading_zeros_32(uint32_t x) { x |= (x >> 1); x |= (x >> 2); x |= (x >> 4); x |= (x >> 8); x |= (x >> 16); x -= ((x >> 1) & 0x55555555); x = ((x >> 2) & 0x33333333) + (x & 0x33333333); x = ((x >> 4) + x) & 0x0f0f0f0f; x += (x >> 8); x += (x >> 16); return (32 - (x & 0x3f)); // change 0x7f to 0x3f } int main(){ // pixel test // 8-bit color depth for black and white photo uint32_t picture[5] = {20,80,128,150,231}; uint32_t threshold = 230; uint32_t *pixel = &picture; for(int i = 0; i < 5; i++){ uint32_t sub = threshold - *(pixel+i); printf("%d, ",i); printf("before = %d, ",*(pixel+i)); sub = count_leading_zeros_32(sub); if(sub) *(pixel+i) = 0; else *(pixel+i) = 255; printf("after = %d\n",*(pixel+i)); } return 0; } ``` ### Assembly Code ```= # input the image from "test" # output the binariztion result to "result" # "threshold" can be change from 0 to 255 # each value of "test" can be change from 0 to 255 # if the number of "test" being change, # the number of "result" and the "s1" parameter in "main" should be change as well .data test: .byte 20, 80, 128, 150, 231 result: .byte 0, 0, 0, 0, 0 threshold: .byte 0x80 mask1: .word 0x55555555 mask2: .word 0x33333333 mask4: .word 0x0f0f0f0f str1: .string " " str2: .string "\nThe Binarization Result:\n" .text main: # a1 = address of pixel # a2 = address of result # a3 = threshold la a1, test # load address of pixel la a2, result # load address of result la a3, threshold # load address of threshold lbu a3, 0(a3) # load value of threshold BINARIZATION: li s1, 5 # count of haven't done pixel B_LOOP: lbu a4, 0(a1) # load value of pixel sub a0, a3, a4 # threshold - pixel jal ra, CLZ # jump to CLZ bne a0, x0, FLOOR # branch to FLOOR if CLZ = 0 CEIL: li a4, 255 j STORE_MOVE # jump to STORE_MOVE FLOOR: li a4, 0 STORE_MOVE: sb a4, 0(a2) addi s1, s1, -1 # total count -1 addi a1, a1, 1 # go to next address of pixel addi a2, a2, 1 # go to next address of result bne s1, x0, B_LOOP li s1, 5 # make count for RESULT_CHECK sub a2, a2, s1 # reset address of result la a0, str2 # start to print the result li a7, 4 ecall RESULT_CHECK: lbu a0, 0(a2) # load result to a0 jal ra, PRINT_INT addi s1, s1, -1 # count print result addi a2, a2, 1 # move to next address of result bne s1, x0, RESULT_CHECK j EXIT CLZ: # a0: x = the number to be counted leading zero # t0: shifted x # t1: x shift & mask # t2: mask PAD1: srli t0, a0, 1 # t0 = x >> 1 or a0, a0, t0 # x |= x >> 1 srli t0, a0, 2 # t0 = x >> 2 or a0, a0, t0 # x |= x >> 2 srli t0, a0, 4 # t0 = x >> 4 or a0, a0, t0 # x |= x >> 4 srli t0, a0, 8 # t0 = x >> 8 or a0, a0, t0 # x |= x >> 8 srli t0, a0, 16 # t0 = x >> 16 or a0, a0, t0 # x |= x >> 16 POPCNT: lw t2, mask1 # load mask1 to t2 srli t0, a0, 1 # t0 = x >> 1 and t1, t0, t2 # t1 = (x >> 1) & mask1 sub a0, a0, t1 # x -= ((x >> 1) & mask1) lw t2, mask2 # load mask2 to t2 srli t0, a0, 2 # t0 = x >> 2 and t1, t0, t2 # (x >> 2) & mask2 and a0, a0, t2 # x & mask2 add a0, t1, a0 # ((x >> 2) & mask2) + (x & mask2) srli t0, a0, 4 # t0 = x >> 4 add a0, a0, t0 # x + (x >> 4) lw t2, mask4 # load mask4 to t2 and a0, a0, t2 # ((x >> 4) + x) & mask4 srli t0, a0, 8 # t0 = x >> 8 add a0, a0, t0 # x += (x >> 8) srli t0, a0, 16 # t0 = x >> 16 add a0, a0, t0 # x += (x >> 16) andi t0, a0, 0x3f # t0 = x & 0x3f li a0, 32 # a0 = 64 sub a0, a0, t0 # 64 - (x & 0x3f) addi sp, sp, -4 # make space for ra in stack sw ra, 0(sp) # push ra into stack jal ra, PRINT_INT lw ra, 0(sp) # pull ra out of stack addi sp, sp, 4 # free the space create in stack for ra ret PRINT_INT: li a7, 1 # print the value of input a0 in int ecall addi sp, sp, -4 sw a0, 0(sp) la a0, str1 li a7, 4 ecall lw a0, 0(sp) addi sp, sp, 4 ret EXIT: li a7, 10 # Halts the simulator ecall ``` ### Result **Test:** * C :::info picture[5] = {20,80,128,150,231} Threshold = 128 +++++ result: 0 0 0 255 255 ::: ``` 0, before = 20, after = 0 1, before = 80, after = 0 2, before = 128, after = 0 3, before = 150, after = 255 4, before = 231, after = 255 ``` * RISC-V :::success test: .byte 20, 80, 128, 150, 231 Threshold: 0x80 = $128_{10}$ +++++ result: 0 0 0 255 255 ::: ![](https://hackmd.io/_uploads/rJTuW4e-a.png) ## Analysis processor stages by Ripes simulator Ripes is a useful tool help us learn about computer architecture and how instructhttps://hackmd.io/ions are processed within a processor. It simulates the various stages of a RISC-V processor pipeline, including IF, ID, EX, MEM, WB. Users can write RISC-V assembly code, load it into Ripes, and see how the instructions are executed in the processor step by step. This section will take the code above as example, and explain how the processor works in each stage. ### Data path In this homework, we use 5-Stage processor to run our code. ![](https://hackmd.io/_uploads/H1l2Mgdz-a.png) | Stage | Description | |:-----:| ---------------------------------- | | IF | Instruction Fetch | | ID | Instruction decode & Register Read | | EX | Execution or Address Calculation | | MEM | Data Memory Calcultion | | WB | Write Back | :::info Here is the description of each stage ::: * **Stage1 Fetch (IF):** In this stage, processor fetches the next instruction from memory. Actually, PC holds the address of the next instruction to be executed, and the stage retrieves corresponding instruction from memory. * **Stage2 Decode (ID):** Once the instruction is fetched, the processor will decodes it in the second stage. Processor will identify the instruction and determine the operands, including data and registers, which will be used. * **Stage3 Execute (EX):** The actual operation of the instruction will be executed. * **Stage4 Memory (MEM):** If the instruction involves accessing memory, those memory operations will be done in this stage. Not all instruction need this stage. * **Stage5 Write Black (WB):** Writing the results of the executed instruction back to the appropriate register or memory. PC also be updated during this stage. :::success The following introduction will take `auipc x11, 0x10000` as example. ::: ### IF ![](https://hackmd.io/_uploads/HJ6mhOG-T.png) The value go of PC is `0x00`, which means the first line of the instructions, `auipc x11 0x10000`. Then, the processor retrives the instruction as an 32-bit data in HEX. The format of `auipc` is as following. ![](https://hackmd.io/_uploads/rkyVwKfZp.png) | imm[32:12] | rd | opcode | |:------------------------:|:------:|:--------:| | 0001 0000 0000 0000 0000 | 0101 1 | 001 0111 | Also represents as `0x10000597` in HEX. ### ID ![](https://hackmd.io/_uploads/rkgFFKMWa.png) The instruction will be decoded in this stage. `auipc x11 0x10000` means directly uses the immediate value `0x10000` adding the PC. Therefore, R1, R2 won't be used in this intruction and is set to `0x00`. In block "Imm.", the result is the imm part of instruction `0x10000000`. ### EX ![](https://hackmd.io/_uploads/r1htv5Gbp.png) In execute stage, the mux choose the input `0x10000000` to pass to ALU, and ALU output the only immediate in `auipc x11 0x10000`. ### MEM ![](https://hackmd.io/_uploads/SyGqK9Gbp.png) Memory storing will be done in this stage. We can know this instruction don't need to save data to memory while the `Wr en` is RED. besides, the wire of `0x10000000` connect back to the MUX in the past stage to prepare for the further instructions, such as `addi x11 x11 0`. ### WB ![](https://hackmd.io/_uploads/BJJkh9fWp.png) This stage will write the result of execute stage into proper registers or memories. In this case, we choose `0x10000000`. By the way, this output wire aslo connects back to the MUX in EX and Wr data in ID.