戴均原
This project involves extending an existing RISC-V processor from its original 3-stage and 5-stage pipelined designs to support the complete RV32I instruction set and the B extension. Additionally, at least two RISC-V programs from the course exercises will be selected and rewritten to utilize the B extension, ensuring proper functionality on the enhanced processor. The final implementation will be published on GitHub.
The process explores the transition from a single-cycle design to a 3-stage pipeline and subsequently to a 5-stage pipeline, focusing on critical design considerations such as forwarding and pipeline optimization.
A simplified processor pipeline with three main stages: Instruction Fetch (IF), Instruction Decode (ID), and Execute (EX). This design focuses on basic instruction flow without incorporating memory access or write-back stages, making it a foundational model for understanding pipeline operations.
Control.scala
Extends the pipeline to include Memory Access (MEM) and Write Back (WB) stages. This version handles data hazards by stalling the pipeline, pausing the flow of instructions when dependencies are detected, ensuring correctness at the cost of performance.
Control.scala
false.B
to ensure no unnecessary flushing or stalling occurs unless specific conditions are met.io.jump_flag
is asserted (true.B
), it indicates that a jump instruction is detected. In response:
true.B
, flushing the instruction fetch (IF) stage.true.B
, flushing the instruction decode (ID) stage.io.reg_write_enable_ex
is true), and the target register (io.rd_ex
) matches one of the source registers (io.rs1_id
or io.rs2_id
) of the current instruction, and the target register is not the zero register (io.rd_ex =/= 0.U
).io.reg_write_enable_mem
is true), and the target register (io.rd_mem
) matches one of the source registers (io.rs1_id
or io.rs2_id
), and the target register is not the zero register (io.rd_mem =/= 0.U
).io.id_flush := true.B
: Flush the instruction decode (ID) stage to avoid processing incorrect instructions.io.pc_stall := true.B
: Stall the program counter (PC) to pause instruction fetching.io.if_stall := true.B
: Stall the instruction fetch (IF) stage to prevent fetching new instructions.Builds on the five-stage model but adds forwarding (bypassing) to reduce stalls. This technique resolves data hazards by forwarding results from later stages directly to earlier stages, improving performance compared to stalling.
Control.scala
NoForward
: No data forwarding is required (default value: 0).ForwardFromMEM
: Forward data from the MEM stage to the EX stage (value: 1).ForwardFromWB
: Forward data from the WB stage to the EX stage (value: 2).reg_write_enable_mem
is true), and the destination register (rd_mem
) matches rs1_ex
, and rd_mem
is not the zero register (/= 0.U
), data is forwarded from the MEM stage.reg_write_enable_wb
is true), and the destination register (rd_wb
) matches rs1_ex
, and rd_wb
is not the zero register (/= 0.U
), data is forwarded from the WB stage.Combines stalling and forwarding for optimal hazard resolution. It incorporates advanced techniques to handle both data and control hazards, representing a more refined and efficient pipeline design.
It has been recognized that there is need to officially standardize a "B" extension - that represents the collection of the Zba, Zbb, and Zbs extensions - for the sake of consistency and conciseness across toolchains and how they identify support for these bitmanip extensions (which, for example, are mandated in RVA and RVB profiles). In conjunction with this an official definition of the misa.B bit will be established - along the lines of misa.B=1 indicating support for at least these three extensions (and misa.B=0 indicating that one or more may not be supported).
https://www.ece.lsu.edu/ee4720/doc/riscv-bitmanip-1.0.0.pdf
http://riscvbook.com/chinese/RISC-V-Reader-Chinese-v2p1.pdf
sh1add:
Performs a shift left by 1 (multiply by 2) and then adds the result to a second operand.
sh2add:
Similar to sh1add
, but shifts the first operand left by 2 (multiply by 4) before addition.
sh3add:
Extends the logic further by shifting the first operand left by 3 (multiply by 8) before addition.
andn:
Performs a bitwise AND of the first operand with the negation of the second operand.
orn:
Performs a bitwise OR of the first operand with the negation of the second operand.
xnor:
Performs a bitwise XOR followed by a NOT operation (XNOR).
min:
Compares two signed integers and selects the smaller value.
minu:
Similar to min
, but operates on unsigned integers.
max:
Compares two signed integers and selects the larger value.
maxu:
Similar to max
, but operates on unsigned integers.
bclr:
Clears a specific bit in the first operand, based on the position specified by the second operand.
bext:
Extracts a specific bit from the first operand, based on the position specified by the second operand.
binv:
Inverts a specific bit in the first operand, based on the position specified by the second operand.
bset:
Sets a specific bit in the first operand, based on the position specified by the second operand.
ALU.scala
Implements an Arithmetic Logic Unit (ALU) using Chisel, a hardware description language based on Scala. The ALU supports the RISC-V RV32I instruction set and parts of the B extension (Zba, Zbb, Zbs).
ALUFunctions
ALU Class
Implements the ALUControl module, which is responsible for decoding the RISC-V instruction's opcode, funct3, and funct7. Based on the instruction type and function code (funct code), it selects the corresponding ALU function (alu_funct).
use riscv32-unknown-elf or riscv64-unknown-elf to complie .s to .asmbin
Since I will be using the B extension, I need to download the latest version.
This program consists of two functions: logint
calculates the base-2 logarithm of an input N (a0), returns its log base 2 in a0, and reverse
reverses the binary representation of N(a0) base on n bits(a1).
find_binary
, converts a decimal number (provided in the a0 register) into its binary representation and returns the result in decimal form. The function works recursively to build the binary equivalent from least significant to most significant bits.