Try   HackMD

Assignment1: RISC-V Assembly and Instruction Pipeline

contributed by < shauming1020 >

tags: riscv

Power Function

Given an integer x and a positive number y, write a function that computes

xy.

I referenced the Write an iterative O(Log y) function for pow(x, y) to implement a function with

O(Logy) time complexity and
O(1)
extra space.

C-code

// Iterative C program to implement pow(x, n) #include <stdio.h> /* Iterative Function to calculate (x^y) in O(logy) */ int power(int x, unsigned int y) { int res = 1; // Initialize result while (y > 0) { // If y is odd, multiply x with result if (y & 1) res = res * x; // n must be even now y = y >> 1; // y = y/2 x = x * x; // Change x to x^2 } return res; }

Obviously, the function contains loops and conditional branches.

RISC-V Assembly-code

.data x: .word 3 y: .word 5 # unsigned str1: .string " ^ " str2: .string " = " .text main: lw s0, x # Set s0 equal to the parameter x lw s1, y # Set s1 equal to the parameter y mv a0, s0 # Set arguments mv a1, s1 jal power # return the result a0 mv s2, a0 # Set s2 equal to the result j print power: addi sp, sp, -12 sw ra, 0(sp) sw s0, 4(sp) sw s1, 8(sp) mv s0, a0 # s0 -> parameter x mv s1, a1 # s1 -> parameter y li s2, 1 # s2 -> res, init = 1 loop: bge x0, s1, Ret # if (0 >= y) break andi t0, s1, 0x1 # t0 -> y & 0x1 li t1, 0x1 # t1 -> 0x1 li ra, 0x54 # Set ra equal to srli s1, s1, 1 beq t0, t1, odd # y is odd srli s1, s1, 1 # y_u >> 1, y /= 2 mul s0, s0, s0 # x = x * x j loop # goto loop odd: mul s2, s2, s0 # res = res * x jr ra # goto srli s1, s1, 1 Ret: mv a0, s2 # Set return reg a0 lw ra, 0(sp) # Restore ra lw s0, 4(sp) # Restore s0 lw s1, 8(sp) # Restore s1 addi sp, sp, 12 # Free space on the stack for the 3 words jr ra # Return to the caller print: mv t0, a0 # Set tmp equal to a0 mv a0, s0 # prepare to print x li a7, 1 # print int ecall la a0, str1 # prepare to print string ^ li a7, 4 # print string ecall mv a0, s1 # prepare to print y li a7, 1 # print int ecall la a0, str2 # prepare to print string = li a7, 4 # print string ecall mv a0, s2 # prepare to print result li a7, 1 # print int ecall

Show


Instruction Pipeline

Pipeline contains main five stages, IF, ID, EX, MEM and WB
ISA: RV32IM

Instruction fetch (IF)

  • The Program Counter (PC), is a register that holds the address that is presented to the instruction memory.

    0x00000000 present the address of instruction 'auipc x8, 0x10000'
    0x00000004 present the address of next instruction 'lw x8, 0(x8)'

  • Next PC is calculated by PC + 4, or take the result of a branch / jump calculation as the next PC.

Instruction Decode (ID)

  • Instructions are presented with 4 bytes, and it has four types in RV32I ISA.
  • The indexes of these two registers are identified within the instruction, and the indexes are presented to the register memory, as the address.

EXE

  • Consists of an ALU and a bit shifter, the ALU for performing boolean operations (and, or, not, nand, nor, xor, xnor) and for perfoming integer addition and subtraction.

MEM

  • Access the data memory with load / store operations.
  • Use MemWrite / MemRead signal to control the behavor.

WB

  • Save the result to register file.

Data Hazards

  • When instructions that exhibit data dependence modify data in different stages of a pipeline.
    • e.g. RAW (Read After Write)
    • i1. R2 <- R5 + R3
    • i2. R4 <- R2 + R3
    • A data dependency between i1 and i2, that may use the wrong value at R2 to compute in i2, because i1 not yet finish in WB stage.

Detection

  • Forwarding detection, check the register of ID/EX and EX/MEM (MEM/WB) whether have the same register file.
  • If data hazard happens, then it will take the forwarding or stall mechanism.

Forwarding (bypassing)

  • When we compute the ALU result value in EXE stage, then pass the value to the next stage.

Stall

  • load-use data hazard, when a destination register of a load instruction equal to the source register of a instruction that is following this load instruction, then we cannot use the forwarding to solve this type of data hazard.
  • Stall one cycle to keep the forwarding mechanism working
    • load-use with x9 register

    • stall one cycle, then ALU res can pass to add x9, x8, x9


Description

main

main: lw s0, x # Set s0 equal to the parameter x lw s1, y # Set s1 equal to the parameter y mv a0, s0 # Set arguments mv a1, s1 jal power # return the result a0 mv s2, a0 # Set s2 equal to the result j print
  • lw x8, 0(x8)

    • ID, set wr_reg_idx: 0x08

    • EXE, when auipc x8 0x10000 done in EX/MEM, the alures_out will be set equal to 0x10000000(x8) and forwarding to EXE(lw), then we can compute the ALU res by x0 immediately to avoid the data hazard suffering.

    • MEM, access Data memory.

    • WB, the target register had been set to wr_reg_idx: 0x08 in ID stage, we just assign the value to the target register in this stage.


    • After WB, register x8 be set equal to the parameter x.

  • jal x1 0x24 < power >

    • EXE, we can fetch the target instruction address by ALU res in this stage, and two unuseful instruction had been fetch in ID, IF, so we need to flush them in next stage.

    • MEM, flush them.


power

power: addi sp, sp, -12 sw ra, 0(sp) sw s0, 4(sp) sw s1, 8(sp) mv s0, a0 # s0 -> parameter x mv s1, a1 # s1 -> parameter y li s2, 1 # s2 -> res, init = 1

We need to store the return address (ra) and store the saved register because we may use it.

  • sw x1, 0(x2)
    • IF, get the value of source register x1
    • EXE, forwarding x2 computed by addi then we can get the 0(x2) in this stage.

    • MEM, activate the Wr_en signal and write the value to stack memory.

loop

loop: bge x0, s1, Ret # if (0 >= y) break andi t0, s1, 0x1 # t0 -> y & 0x1 li t1, 0x1 # t1 -> 0x1 li ra, 0x54 # Set ra equal to srli s1, s1, 1 beq t0, t1, odd # y is odd srli s1, s1, 1 # y_u >> 1, y /= 2 mul s0, s0, s0 # x = x * x j loop # goto loop
  • bge x0 x9 40 < Ret >

    • EXE, when branch condition is met, PC will be set to ALU res, so that can fetch the correct instruction in next stage, avoid the branch hazard.
    • MEM, flush two unuseful instruction with nop.
  • li ra, 0x54

    • Need to set the return address with address of 'srli s1, s1, 1' before go into odd, so that can execute the correct instruction after doing the mul s2, s2, s0.


      here we can see the address of 'srli x9, x9, 1' is 0x54.

Ret

Ret: mv a0, s2 # Set return reg a0 lw ra, 0(sp) # Restore ra lw s0, 4(sp) # Restore s0 lw s1, 8(sp) # Restore s1 addi sp, sp, 12 # Free space on the stack for the 3 words jr ra # Return to the caller

Before return, we need to set the return value to a0 and restore ra, saved registers.