--- tags : Computer_Architecture, csapp, cs61c, jserv, RISC-V --- # Computer Architecture HW1 contributed by <[`WangHanChi`](https://github.com/WangHanChi)> ## Question My question in Leetcode is [1550. Three Consecutive odds](https://leetcode.com/problems/three-consecutive-odds/), and it is an `easy` difficuty. The description of the question is as follows >Given an integer array `arr`, return `true` if there are three consecutive odd numbers in the array. Otherwise, return `false`. ## Solution ### My thinking path My problem-solving idea is to first check out th incoming arrays `arr[i]` whose size is less than three. Then, check the array one by one, that is, to determine whether it is odd from the beginning `arr[0]`. As long as three consecutive ones are found, it will return `true`, otherwise, it will return an `false`. ### Code optimization 1. `if(arrSize < 3) return false;` This can prevent test data which less than three from entering the `for` loop 2. `(*(arr + i) & 1)` In order to speed up program executio, I use bitwise `AND` to replace another spelling `(*(arr + i) %2 == 1)`. ## C code Here is my source code in [github](https://github.com/WangHanChi/2022_Computer_Architecture/tree/master/HW01). ```c #include <stdio.h> #include <stdlib.h> #include <stdbool.h> bool threeConsecutiveOdds(int *arr, int arrSize) { int count = 0; if (arrSize < 3) return false; for (int i = 0; i < arrSize; i++) { if (*(arr + i) & 1) { if (++count == 3) return true; } else { count = 0; } } return false; } int main(int argc, char *argv[]) { int test1[4] = {2, 6, 4, 1}; int test2[9] = {1, 2, 34, 3, 4, 5, 7, 23, 12}; int test3[9] = {1, 3, 5, 7, 9, 11, 13, 15, 17}; bool answer1 = threeConsecutiveOdds(test1, sizeof(test1) / sizeof(test1[0])); bool answer2 = threeConsecutiveOdds(test2, sizeof(test2) / sizeof(test2[0])); bool answer3 = threeConsecutiveOdds(test3, sizeof(test3) / sizeof(test3[0])); printf("The answer1 is %d\n", answer1); printf("The answer2 is %d\n", answer2); printf("The answer3 is %d\n", answer3); return 0; } /* The terminal output is hank@hank:~/Computer_Architecture/HW01$ ./a.out The answer1 is 0 The answer2 is 1 The answer3 is 1 */ ``` ## RISC-V assembly code Here is my source code in [github](https://github.com/WangHanChi/2022_Computer_Architecture/tree/master/HW01). ```c .data test0: .word 2, 6, 4, 1 test1: .word 1, 2, 34, 3, 4, 5, 7, 23, 12 test2: .word 1, 3, 5, 7, 9, 11, 13, 15, 17 str0: .string "\nThe answer is false\n" str1: .string "\nThe answer is true\n" str2: .string " " .text # s1 : array base address # s2 : the size of array(int arrSize) # s3 : int count = 0 # s4 : 3 # t0 = i # t1 = arr[i] # t2 # the value of (*(array + i) & 1) # t3 :1 main: la s1, test0 # Load the address to s1 addi s2, x0, 4 # Load the array size to s2 addi s4, x0, 3 # Load the constant s4 = 3 addi t3, x0, 1 # Load the constant t3 = 1 jal ra, print_number la s1, test0 # Load the address to s1 addi s2, x0, 4 # Load the array size to s2 jal ra, threeConsecutiveOdds la s1, test1 # Load the address to s1 addi s2, x0, 9 # Load the array size to s2 addi s4, x0, 3 # Load the constant s4 = 3 addi t3, x0, 1 # Load the constant t3 = 1 jal ra, print_number la s1, test1 # Load the address to s1 addi s2, x0, 4 # Load the array size to s2 jal ra, threeConsecutiveOdds la s1, test2 # Load the address to s1 addi s2, x0, 9 # Load the array size to s2 addi s4, x0, 3 # Load the constant s4 = 3 addi t3, x0, 1 # Load the constant t3 = 1 jal ra, print_number la s1, test2 # Load the address to s1 addi s2, x0, 4 # Load the array size to s2 jal ra, threeConsecutiveOdds j Exit print_number: lw t1, 0(s1) addi s1, s1, 4 addi a0, t1, 0 li, a7, 1 ecall la a0, str2 #load string li a7, 4 # Call system call, and print string ecall addi s2, s2, -1 bge s2, t3, print_number ret threeConsecutiveOdds: add t0, x0, x0 # int i = 0; blt s2, s4, false # if(arrSize < 3) return false #j forloop # arrSize >= 3, and will go to forloop forloop: lw t1, 0(s1) # Load the value of s1 to t1 addi s1, s1, 4 # *(arr + 1) andi t2, t1, 1 # (*(arr + i) & 1) and t2 will store it beq t3, t2, isEqual # (*(arr + i) & 1) equal 1 or not addi s3, x0, 0 # count = 0 addi t0, t0, 1 # i = i + 1 blt t0, s2, forloop # if i < arrSize j false isEqual: addi s3, s3, 1 # ++count beq s3, s4, true # if(++count == 3) j forloop # return to the forloop true: la a0, str1 #load string li a7, 4 # Call system call, and print string ecall ret # j Exit false: la a0, str0 #load string li a7, 4 # Call system call, and print string ecall ret # j Exit Exit: li a7, 10 # Exit the program ecall ``` :::warning :warning: Can you use fewer instructions? :notes: jserv ::: ## 10/4 Rewrite the RISC-V Assembly Code Follow the guidance of the professor, I tried to reduce the instructions in main and other function.I used the top-down execution feature to remove the excess part. As following is [new assembly code](https://github.com/WangHanChi/2022_Computer_Architecture/tree/master/HW01). ```c .data test0: .word 2, 6, 4, 1 test1: .word 1, 2, 34, 3, 4, 5, 7, 23, 12 test2: .word 1, 3, 5, 7, 9, 11, 13, 15, 17 str0: .string "\nThe answer is false\n" str1: .string "\nThe answer is true\n" str2: .string " " .text # a0 : save the return value # a1 : the base address of arr # a2 : the size of arr # t0 : i # t1 : count # s0 : arr[i] # s1 : the value of (*(arr + 1) & 1) main: addi s2, x0, 3 # Set s2 = 3 addi t3, x0, 1 # set t3 = 1 la a1, test0 # Load the address to a1 addi a2, x0, 4 # Load the array size to a2 jal ra, print_number # go to print_number la a1, test1 # Load the address to a1 addi a2, x0, 9 # Load the array size to a2 jal ra, print_number # go to print_number la a1, test2 # Load the address to a1 addi a2, x0, 9 # Load the array size to a2 jal ra, print_number # go to print_number j Exit # exit the program print_number: mv t2, a2 # copy the data form a2 to t2 lw s0, 0(a1) # load the data form a1 to s0 addi a1, a1, 4 # go to the next data addi a0, s0, 0 # store the data from s0 to a0 li, a7, 1 # call system call (printInt) ecall la a0, str2 # load string form str2 to a0 li a7, 4 # Call system call (printString) ecall addi a2, a2, -1 # arrSize -= 1 bge a2, t3, print_number # if (a2 > t3) go to print_number threeConsecutiveOdds: add t0, x0, x0 # int i = 0; blt s2, s3, false # if(arrSize < 3) return false forloop: lw s0, 0(a1) # Load the value form a1 to s0 addi a1, a1, 4 # go to the next data andi s1, s0, 1 # (*(arr + i) & 1) and s1 will store it beq t3, s1, isEqual # (*(arr + i) & 1) equal 1 or not addi t1, x0, 0 # count = 0 addi t0, t0, 1 # i = i + 1 blt t0, s2, forloop # if i < arrSize j false isEqual: addi t1, t1, 1 # ++count beq s2, t1, true # if(++count == 3) j forloop # return to the forloop true: la a0, str1 # load string li a7, 4 # Call system call (printString) ecall ret false: la a0, str0 # load string li a7, 4 # Call system call (printString) ecall ret Exit: li a7, 10 # Exit the program ecall ``` --- ## Result **Test1 :** * input[2, 6, 4, 1], arrSize = 4 * C_output = false * Assembly_output = 2 6 4 1 The answer is false **Test2 :** * input[1, 2, 34, 3, 4, 5, 7, 23, 12], arrSize = 9 * C_output = true * Assembly_output = 1 2 34 3 4 5 7 23 12 The answer is true **Test3 :** * input[1, 3, 5, 7, 9, 11, 13, 15, 17] * C_output = true * Assembly_output = 1 3 5 7 9 11 13 15 17 The answer is true Here are the picture of the output. * C code ![](https://i.imgur.com/DL7Hi6K.png) * RISC-V assembly code ![](https://i.imgur.com/vQrX8sO.png) --- ## Analysis We use [Ripes](https://github.com/mortbopet/Ripes) to simulate RISC-V processor ### Ripes Simulator Ripes is a **five-stage** execution instruction pipeline simulator. ![](https://i.imgur.com/WkZEFXY.png) Five-stage divides the Datapath. ![](https://i.imgur.com/rd9L6HR.png) | Stage | Description | | ----- | ---------------------------------- | | IF | Instruction Fetch | | ID | Instruction Decode & Register Read | | EX | Execution or Address Calculation | | MEM | Data Memory Access | | WB | Write Back | Here is the five-stage of the RISC pipeline with their respective operations : * **Stage 1 (Instruction Fetch)** * CPU reads instruction pipeline from the address in the memory whose value is present in the program counter(PC). * **Stage 2 (Instruction Decode)** * Instruction is decoded and the register file is accessed to get the values from the register used int the instruction. * In this stage, the CPU will decide which data to input between *(Reg1 + Reg2)* and *(Reg1 + Imm)* according to the instruction. * **Stage 3 (Instruction Execute)** * ALU operations are performed in this stage. * **Stage 4 (Memory Access)** * Memory operands are read and written from/to the memory that is present in the instruction. * **Stage 5 (Wirte Back)** * The value which is Computed or Fetched will be written back to the register present in the instructions. #### IF Use my code as an example, here is my [pseudo instruction](https://hackmd.io/BF459yoPRe2oLbZHiy22xw?view#Pseudo-Instruction). ![](https://i.imgur.com/rpTvy5m.png) ![](https://i.imgur.com/dcUwmB8.png) PC input the instrution which address is `0x00000004`, and the instruction memory will translate and output the 32 bits instruction for RISC-V CPU. In this case it will output `0x00300913`The odiginal instruction is `addi x18 x0, 3`.Beacuse of the I-Format instruction `addi` is as follownig. - [ ] imm : 0000, 0000, 0011 - [ ] rs1 : 00000(x0) - [ ] funct3 : 000 - [ ] rd : 10010(x18) - [ ] opcode : 0010011 | imm[11:0] | rs1 | funct3 | rd | opcode | | ------------ | ----- | ------ | ----- | ------- | | 000000000011 | 00000 | 000 | 10010 | 0010011 | So, it will be transform to Hex is `0x00300913`, that is the instruction memory output. #### ID ![](https://i.imgur.com/UvAGlaK.png) The instruction will be decoded, and output `R1 idx = 0x00`, `R2 idx = 0x03`.`REG1_out = 0x00000000`, `REG2_OUT = 0x10000000` And it put the immdeiate in the down place and it also send to next section. #### EX ![](https://i.imgur.com/OS9MLp0.png) In this stage, there are four multiplexer(MUX) to choise which inputs will be used.`Op 1 = 0x00000000`, `Op 2 = 0x00000003` and `alures_in = 0x00000003`. Because it execute the "add" instruction, so the result is predictable. #### MEM ![](https://i.imgur.com/Ceu09kw.png) In memory stage, we can find the `data_out = 0x100e1300` but the "Wr en" with red light represents we don't store the data to memory. By the way, there is a **"forwarding" circuit** and it connect back to the four MUX, it also carry the data `alures_out = 0x00000003`. #### WB ![](https://i.imgur.com/DThOA1v.png) In the finial stage, WB. It will store the data `in_1 = 0x00000003` to the destination register --- ## Pipeline Hazard A hazard is a situation that prevents starting the next instruction in the next clock cycle.There are three types of hazard. 1. **Structural hazard** * A required resource is busy 2. **Data hazard** * Data dependency between instructions * Need to wait for previous instruction to complete its data write 3. **Control hazard** * flow of execution depends on previous instruction ### Structural Hazard * Encounter problem : Two or more instructions in the pipeline compete for access to a single physical resource #### Register File Structural Hazard ![](https://i.imgur.com/cYEXrGt.png) We can see there are two register file be used in `ID` and `WB` at the same time! * Solution : There are two ways to solve 1. Build RegFile with independent read and write ports 2. ==Double Pumping== (which is adopted by RISC-V) * Double Pumping is split RegFile access in two! Prepare to writeduring 1st half, write on falling edge, read during 2nd half of each clock cycle. So that can allow it `read` and `write` to registers durnig same clock cycle. #### Memory structural hazard ![](https://i.imgur.com/v4fP1Pw.png) We can see there are two memory units be used in `IF` and `MEM` at the same time! * Solution : Without separate memory units, instruction fetch would have to stall for that cycle. ### Data hazard * Identifying data hazard : * Where is data WRITTEN? * Where is data READ? * Does the WRITE happen AFTER the READ? * Instruction depends on result from previous instruction.Taking the as following for an example.We would not wait for the result of `t0`, and we will execute the `sub` instruction which need the value of `t0`, that is a data Hazard! ```c add t0, t1, t2 sub t4, t0, t3 ``` #### R-type data hazard ![](https://i.imgur.com/liuem3Q.png) We can see the `t0` register be used in the neighboring instruction. * Solution : ==Stalling==, we use the `do nothing instruction` to fill the blank. ![](https://i.imgur.com/ygQVN9G.png) Decrease throughput of “valid” or useful instructions. Can also be seen as increasing the latency of our stalled instruction. * Solution : ==Forwarding==, Forward result as soon as it is available, even though it’s not stored in RegFile yet. ![](https://i.imgur.com/9kcRynW.png) Don’t wait for it to be stored in a register Requires extra hardware in the datapath (and extra control!) ![](https://i.imgur.com/sVHtCy5.png) We can also find that there is a dedicated circuit for processing `Forwarding` #### Load data hazard Encounter problem : Dataflow backwards in time are hazards, and it can't be solved by forwarding. ![](https://i.imgur.com/4GX5RVZ.png) * Soluting : ==stall==, wait for the stage to next. ### Control Hazard Encounter problem : Branch instruction will disrupt the original order, so pipeline can’t always fetch correct instruction until end of execute * Solution : ==Branch Prediction==, guess outcome of abranch, fix afterwards if necessary ![](https://i.imgur.com/vfPhq81.png) **Branch prediction in today is very effective!!! ** ### Find Hazard in my RISC-V assembly code 1. **Control hazard** ![](https://i.imgur.com/gSJ56jX.png) ![](https://i.imgur.com/ew5cWqp.png) In my assembly code, `jal ra, print_number` is a branch instruction. So, there is a control hazard.We can see it originally would process the next instruction `auipc x11, 0x10000` , but when it execute the branch instruction `jal ra, print_number`.It will jump to the `print_number` label, therefore, it will clear original two stages instructions and input the new instruction in `print_number` label. 2. **Data hazard** ![](https://i.imgur.com/J7mQC5C.png) I found that when calling ecall, it will automatically generate two NOPs.But I checked the value of registers i used, there are already correct value in the registers.Therefore, I guess that there are some underlying implementations of ecall use some operations that we don't know about. To check the guess, I test two types data(.word and .string) in Ripes and got the same result. Even if the registers of a0 and a7 already have values, they will still be stuffed with two NOPs. The following figure is an example of string as output data. ```c .data str: .string "hello" .text main: la a0, str addi x0, x0, 0 addi x0, x0, 0 addi x0, x0, 0 addi x0, x0, 0 addi x0, x0, 0 addi x0, x0, 0 li a7, 4 addi x0, x0, 0 addi x0, x0, 0 addi x0, x0, 0 addi x0, x0, 0 addi x0, x0, 0 addi x0, x0, 0 ecall ``` ![](https://i.imgur.com/BqEdxex.png) --- ## Appendix1 : Pseudo Instruction Ripes will automatically convert the pseudo instructions to the Executable instructions(base instructions). And the left place is program counter(PC). ```= 00000000 <main>: 0: 10000497 auipc x9 0x10000 4: 00048493 addi x9 x9 0 8: 00400913 addi x18 x0 4 c: 00300a13 addi x20 x0 3 10: 00100e13 addi x28 x0 1 14: 068000ef jal x1 104 <print_number> 18: 10000497 auipc x9 0x10000 1c: fe848493 addi x9 x9 -24 20: 00400913 addi x18 x0 4 24: 088000ef jal x1 136 <threeConsecutiveOdds> 28: 10000497 auipc x9 0x10000 2c: fe848493 addi x9 x9 -24 30: 00900913 addi x18 x0 9 34: 00300a13 addi x20 x0 3 38: 00100e13 addi x28 x0 1 3c: 040000ef jal x1 64 <print_number> 40: 10000497 auipc x9 0x10000 44: fd048493 addi x9 x9 -48 48: 00400913 addi x18 x0 4 4c: 060000ef jal x1 96 <threeConsecutiveOdds> 50: 10000497 auipc x9 0x10000 54: fe448493 addi x9 x9 -28 58: 00900913 addi x18 x0 9 5c: 00300a13 addi x20 x0 3 60: 00100e13 addi x28 x0 1 64: 018000ef jal x1 24 <print_number> 68: 10000497 auipc x9 0x10000 6c: fcc48493 addi x9 x9 -52 70: 00400913 addi x18 x0 4 74: 038000ef jal x1 56 <threeConsecutiveOdds> 78: 0900006f jal x0 144 <Exit> 0000007c <print_number>: 7c: 0004a303 lw x6 0 x9 80: 00448493 addi x9 x9 4 84: 00030513 addi x10 x6 0 88: 00100893 addi x17 x0 1 8c: 00000073 ecall 90: 10000517 auipc x10 0x10000 94: ff350513 addi x10 x10 -13 98: 00400893 addi x17 x0 4 9c: 00000073 ecall a0: fff90913 addi x18 x18 -1 a4: fdc95ce3 bge x18 x28 -40 <print_number> a8: 00008067 jalr x0 x1 0 000000ac <threeConsecutiveOdds>: ac: 000002b3 add x5 x0 x0 b0: 05494263 blt x18 x20 68 <false> 000000b4 <forloop>: b4: 0004a303 lw x6 0 x9 b8: 00448493 addi x9 x9 4 bc: 00137393 andi x7 x6 1 c0: 007e0a63 beq x28 x7 20 <isEqual> c4: 00000993 addi x19 x0 0 c8: 00128293 addi x5 x5 1 cc: ff22c4e3 blt x5 x18 -24 <forloop> d0: 0240006f jal x0 36 <false> 000000d4 <isEqual>: d4: 00198993 addi x19 x19 1 d8: 01498463 beq x19 x20 8 <true> dc: fd9ff06f jal x0 -40 <forloop> 000000e0 <true>: e0: 10000517 auipc x10 0x10000 e4: f8e50513 addi x10 x10 -114 e8: 00400893 addi x17 x0 4 ec: 00000073 ecall f0: 00008067 jalr x0 x1 0 000000f4 <false>: f4: 10000517 auipc x10 0x10000 f8: f6450513 addi x10 x10 -156 fc: 00400893 addi x17 x0 4 100: 00000073 ecall 104: 00008067 jalr x0 x1 0 00000108 <Exit>: 108: 00a00893 addi x17 x0 10 10c: 00000073 ecall ``` --- ## Reference [Leetcode 1550](https://leetcode.com/problems/three-consecutive-odds/) [Computer Organization and Architecture](https://www.geeksforgeeks.org/computer-organization-and-architecture-pipelining-set-1-execution-stages-and-throughput/) [The RISC-V Instruction Set Manual](https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf) [2021 Homework1: RISC-V Assembly and Instruction Pipeline](https://hackmd.io/qcy4ey6CTh2ffP4t3TQfbA)