# Assignment1: RISC-V Assembly and Instruction Pipeline contributed by < `OscarShiang` > ## Counting Bits ([LeetCode 338](https://leetcode.com/problems/counting-bits/)) The solution is separated into 2 parts: - iterate from 0 to n - count 1's of each number To compute the 1's of a number, I apply the idea that the result of `a & (a - 1)` will always be the value of a without the least significant 1. For example: If $a = 5$, the binary representation is: $$ 5_{10} = 0101_2 $$ Now we can apply `a & (a - 1)` on it: $$ 0101_2\ \&\ 0100_2 = 0100_2 - (1) $$ After the manipulation, bit 1 at position 0 is cleaned. Since the value of a is not 0, we can apply the rule again: $$ 0100_2\ \&\ 0011_2 = 0000_2 - (2) $$ Bit 1 at position 2 is unset, we can now stop the computation. Because we have applied the rule twice (1, 2 above), we know that the number of 1's is `2`. ## Implementation ### C code :::info Alternatively, check the branchless implementation of `popcnt`. See https://github.com/jserv/amacc/blob/master/amacc.c#L504 :notes: jserv ::: ```c #include <stdint.h> #include <stdio.h> #include <stdlib.h> int popcnt(uint32_t a) { int cnt = 0; while (a != 0) { a &= (a - 1); cnt++; } return cnt; } int *count_bits(int n, int *rsize) { int *res = malloc((n + 1) * sizeof(int)); for (int i = 0; i <= n; i++) { res[i] = popcnt(i); } *rsize = n + 1; return res; } int main(void) { int rsize; int *res = count_bits(5, &rsize); printf("[%d", res[0]); for (int i = 1; i < rsize; i++) { printf(",%d", res[i]); } printf("]\n"); free(res); return 0; } ``` ### assembly code To store the result array, I allocate the space on the stack and pass the starting address into `count_bits` routine. After filling the values into the array, then it will call `print` to list all the values in the array. ``` .data n: .word 5 lbracket: .string "[" rbracket: .string "]\n" comma: .string "," .text main: # allocate space for returned array lw t0, n slli t0, t0, 2 sub sp, sp, t0 # count bits lw a0, n mv a1, sp jal count_bits # print the result mv a0, sp lw a1, n slli a1, a1, 2 jal print # restore sp lw t0, n slli t0, t0, 2 add sp, sp, t0 # exit the program li a7, 10 ecall popcnt: li t2, 0 # population mv t3, a0 popcnt_conti: beq t3, x0, popcnt_end # apply a &= (a - 1) addi t4, t3, -1 and t3, t3, t4 addi t2, t2, 1 j popcnt_conti popcnt_end: mv a0, t2 ret count_bits: addi sp, sp, -4 sw ra, 0(sp) # store the return address mv t5, a1 # base address of the array mv t0, a0 # n li t1, 0 # i loop: mv a0, t1 # load the argument jal popcnt sw a0, 0(t5) addi t1, t1, 1 addi t5, t5, 4 bge t0, t1, loop lw ra, 0(sp) addi sp, sp, 4 ret print: mv t0, a0 # sp value mv t1, a1 # boundary add t1, t1, t0 # print left bracket la a0, lbracket li a7, 4 ecall # print first number lw a0, 0(t0) li a7, 1 ecall addi t0, t0, 4 blt t1, t0, print_end print_conti: # print comma la a0, comma li a7, 4 ecall # print number lw a0, 0(t0) li a7, 1 ecall addi t0, t0, 4 bge t1, t0, print_conti print_end: # print right bracket la a0, rbracket li a7, 4 ecall ret ``` ## Analysis ### Pseudo Instructions When we load the assembly code into Ripes, it will translate the peuso instructions in our source code into the actual readable instructions. Also, the registers' names will be changed to consistent form. The following is a part of the assembly code that is translated by Ripes: ``` 00000000 <main>: 0: 10000297 auipc x5 0x10000 4: 0002a283 lw x5 0 x5 8: 00229293 slli x5 x5 2 c: 40510133 sub x2 x2 x5 10: 10000517 auipc x10 0x10000 14: ff052503 lw x10 -16 x10 18: 00010593 addi x11 x2 0 1c: 054000ef jal x1 84 <count_bits> 20: 00010513 addi x10 x2 0 24: 10000597 auipc x11 0x10000 28: fdc5a583 lw x11 -36 x11 2c: 00259593 slli x11 x11 2 30: 078000ef jal x1 120 <print> 34: 10000297 auipc x5 0x10000 38: fcc2a283 lw x5 -52 x5 3c: 00229293 slli x5 x5 2 40: 00510133 add x2 x2 x5 44: 00a00893 addi x17 x0 10 48: 00000073 ecall ``` The instructions like `li` and `ret` have been replaced with other machine-readable forms. (The mapping from pseudo instruction to the actual instruction can be found in [RISC-V Assembly Programmer's Manual](https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md#a-listing-of-standard-risc-v-pseudoinstructions)) ### 5-stage in-order processor with hazard detection and forwarding There are several processor types in Ripes, including **single-cycle processor**, **5-stage processor**, **6-stage dual-issue processor**, and so on. The one we demonstrate here is **5-stage processor with hazard detection and forwarding**. This processor will run the given instructions sequentially. When the hazards occur, the processor will generate stalls to defer the incoming instructions or send the signals to reset the pipeline registers if needed. The processor has 5 stages to process the instructions, which is: 1. `IF`: fetch each instruction 2. `ID`: decode the instruction and load the value from the registers 3. `EX`: perform arithmetic operations according to the instruction 4. `MEM`: load/store the value from/to the memory if needed 5. `WB`: write the value back to the registers The value processed in each stage will be stored temporarily in the pipeline registers (the box diagram between each stage) The block diagram of this processor is: ![](https://i.imgur.com/3gBLHDB.png) ### Data Hazards Data hazard occurs at the beginning of this program. When we want to load the value `n` and perform left shifts on it, this forms a RAW type data hazard. The instructions and their corresponding processor stages are shown below: ![](https://i.imgur.com/I4pdXZB.png) We can see that the `slli` instruction on the left try to read the value from `x5` register while the `lw` instruction on the right is not yet completed. The processor then detects the data hazard and sends the reset signal back to `ID/EX` registers to clean the value of them. So that the `slli` instruction will be deferred until the value is stored into `x5`. In the diagram below, we can find the signal for `IDEX` register is set to `0x00000000`, thus generate a stall between the `lw` and `slli` instruction. ![](https://i.imgur.com/U1o4vQf.png) ### Control Hazards Control hazards may occur around the branch type instructions (include `beq`, `blt`, and `j`, etc.) In this program, the control hazard appear in the following condition: ![](https://i.imgur.com/TFFEZzP.png) The program is going to jump into the `count_bits` labeled address. Since it is a pipeline processor, it try to fetch the next 2 instruction just after the `jal` instruction. ![](https://i.imgur.com/9cmF9zO.png) After the `jal` signal steps out `EX` stage, the processor realizes that control hazard occurs and flushes the previous 2 fetched instruction. Because the 2 instruction will no longer to be executed in the next few cycles. Besides creating stalls to eliminate the effect of control hazard, the processor fetch the accurate instruction according to the new value of `PC`, which is the `addi` instruction on the left the above image. ### Environment Calls To perform some functionalities that is not provided by the ISA, Ripes defines a interface and several system calls to deal with this problem. The description of the system calls can be found in [Environment Calls for Ripes](https://github.com/mortbopet/Ripes/wiki/Environment-calls). We use `print_int`, `print_string` and `exit` in this program ## References - [Counting Bits - LeetCode](https://leetcode.com/problems/counting-bits/) - [RISC-V Assembly Programmer's Manual](https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md) - [RISC-V Green Card](https://www.cl.cam.ac.uk/teaching/1617/ECAD+Arch/files/docs/RISCVGreenCardv8-20151013.pdf) - [Environment Calls for Ripes](https://github.com/mortbopet/Ripes/wiki/Environment-calls) ###### tags: `arch2021`