邱柏穎, 黃詩哲
Cloning the repository from 5-Stage-RV32I
Due to the original author's unintended mistake in setting the path, we need to modify it manually.
Then run the processor simulation by sbt test
, the output should be
sbt.version=1.9.1
Refer to sbt Reference Manual - Hello, World we can quickly setup a simple Hello world build
for our own 5-Stage-RV32I
The tree should be
After that, we need to modify the build.sbt
file to include Chisel as a dependency.
To test our 5-stage pipelined, we utilize riscv-tests
.
By default, the PC start address in riscv-tests is set to 0x80000000
, but for convenience, we modify it to 0x00000000
.
then make the riscv-tests
the test file from riscv-tests will be generated at /src/target/share/riscv-tests/isa
, for instance we can check by
we have to transfer the EFE file
into .bin
file.
then change it into .hex
file
the result .hex
file would be like
Ref: riscv-chisel-book - Chapter 20
To streamline the process of managing and deploying the project, we use Docker to package everything. Follow the steps below to build and run the Docker container:
Run the following command in your project directory to create a Docker image:
This will build a Docker image named riscv/our_riscv
.
Once the image is built, use the command below to start the Docker container and mount the current directory to /src inside the container:
DockerFile:
The following image illustrates the five-stage pipelined datapath
where the folder structure is as follows
The following will introduce the contents of each object.
Memory units which are fetched during execution.
For Data Memory, there are five I/O ports as following,
I/O Port | Variable Name | Description |
---|---|---|
MemAddress | addr |
Specifies the memory address to either read data from or write data to. |
MemWriteData | dataIn |
The data to be written into the memory. |
MemREn | mem_read |
Indicates whether a read operation is enabled. |
MemWEn | mem_write |
Indicates whether a write operation is enabled. |
MemReadData | dataOut |
Outputs the data read from the memory. |
After initializing the I/O ports, creating a memory module by using
Where 1024
is the number of memory cells or locations, and SInt(32.W)
defines the data type stored as a signed integer that is 32 bits wide.
Therefore, we will have 32 bits * 1024 = 32768 bits = 4096 Bytes(4 KB)
memory.
There are two types of read-write memories that can be implemented in Chisel: SyncReadMem
and Mem
.
SyncReadMem
represents synchronous-read, synchronous-write memories, where the values on the read data port are not guaranteed to be valid until the next clock cycle.Mem
represents asynchronous-read, synchronous-write memories, where the value is output immediately after the address is provided.For the implementation of the 5-stage RISC-V pipeline, we choose Mem
due to its simpler integration, although SyncReadMem
is closer to real-world hardware behavior, such as that of FPGAs and other applications.
For the final step, we initialize the output to 0
and perform read/write operations based on the values of io.mem_write
and io.mem_read
.
The complete code of Data Memory.scala
would be
For Instruction Memory, there are two I/O ports as following,
I/O Port | Variable Name | Description |
---|---|---|
PC address | addr |
PC address for the the instructions. |
inst | data |
The data of instructions. |
As stated before, the instruction memory block also used a Mem
class for declaration, which allocate \(2^{12}\) bytes (4 KB) in total.
WE implement this module as read-only memory, therefore, there is only one input and one output.
A noticeable thing here is that, we called the loadMemoryFromFile
function.
This function load binary into memory module. In this senerio, the module load test.txt
in the root directory of the package into the InstMem
module. Finally, we drive the output signal with :
We devide io.addr
by 4
for word alignment.
2025/01/09
To enable our module to input .hex
files generated by riscv-tests, we modified the module as follows:
The memory (imem
) is defined as 4096 x 8-bit, where each location represents a single byte and the total storage remain the same.
Fetches 4 consecutive bytes and concatenates them to form a 32-bit instruction.
Putting all together we get:
Pipeline registers for storing results of the previous stage.
For IF stage to ID stage , there are two I/O ports as following,
I/O Port | Variable Name | Description |
---|---|---|
PC (I/O) | pc_in(out) , pc4_in(out) , SelectedPC(out) |
PC for the instructions, the next instructions, and selected PC. |
inst (I/O) | SelectedInstr(out) |
The instruction that has been selected. |
The PC (I/O)
and PC+4 (I/O)
correspond to the I/O ports of the PC
module, while the SelectedPC (I/O)
and Selected Instruction (I/O)
correspond to the I/O ports of the instruction module.
This module serve as register that save program counter and the fetched instruction accordingly. In order to speed up the pipeline, we forward PC
and PC+4
simultaneously.
Four registers are declared and initialized using RegInit
, which sets their reset values:
These reset values are applied during a system reset, ensuring the hardware starts in a known state.
During normal operation, the registers are updated with the input signals every clock cycle:
This ensures that the register values are synchronized with the input signals on each clock edge.
Putting all together we get:
In the ID_EX
, EX_MEM
, and MEM_WB
stages, we use RegNext
to ensure that the input signals represent the next states of the registers, while the output signals reflect their current states.
For ID stage to EX stage , there are four I/O ports as following,
I/O Port | Variable Name | Description |
---|---|---|
PC (I/O) | IFID_pc4_in(out) |
Program counter passed to the next stage. |
RegRead Data 1 (I/O) | rs1_data_in(out) |
Data stored in the rs1 register. |
RegRead Data 2 (I/O) | rs2_data_in(out) |
Data stored in the rs2 register |
inst (I/O) | rs1_in(out) ,rs2_in(out) ,rd_in(out) , func3_in(out) , func7_in(out) |
Indecices of the selected registers, and the decoded parts of the instruction. |
so we have :
For EX stage to MEM stage , there are four I/O ports as following,
I/O Port | Variable Name | Description |
---|---|---|
PC (I/O) | None |
|
ALU Out (I/O) | alu_out (EXMEM_alu_out) |
Result computed by ALU. |
RegRead Data 2 (I/O) | IDEX_rs2 (EXMEM_rs2_out) |
Data read from the register rs1 . |
inst (I/O) | IDEX_rd (EXMEM_rd_out) |
Index of the register for storing data read from memory. |
so we have :
For MEM stage to WB stage , there are two I/O ports as following,
I/O Port | Variable Name | Description |
---|---|---|
Reg Write Data (I/O) | in_dataMem_out (MEMWB_dataMem_out) , in_alu_out (MEMWB_alu_out) |
Data to be written to memory, and address computed by the ALU. |
inst (I/O) | EXMEM_rd (MEMWB_rd_out) |
Index of register that will store the data. |
so we have :
To generate the control signal for the ALU, we design the ALUDecode
unit. This unit takes funct3
, funct7
, and aluOp
as inputs, which represent the instruction's functional fields and operation type.
To find out what kind of operation the ALU
module need to perform, we first identify what instruction's type is, then generate ALUSel
signal from funct3
(or funct3 + funct7 if the instruction is R-type
). The following section is how we generate the ALUSel
signal:
With this module, designing ALU
module would simple.
I/O Port | Variable Name | Description |
---|---|---|
A | in_A |
Input data A |
B | in_B |
Input data B |
ALU Select | ALUSel |
ALU select decide which operation to compute |
ALU Output | out |
The result of ALU. |
ALU is used for performing arithmetic operations (such as addition, subtraction, multiplication, and division) and logical operations (such as AND, OR, NOT, XOR). How the ALU perform operation is defined as :
And the ALU will execute according to the table:
switch(io.ALUSel)
is(ADD) { io.out := io.in_A + io.in_B }
is(SUB) { io.out := io.in_A - io.in_B }
is(AND) { io.out := io.in_A & io.in_B }
is(OR) { io.out := io.in_A
is(XOR) { io.out := io.in_A ^ io.in_B }
is(SLL) { io.out := io.in_A << io.in_B(4, 0) } // Limit shift amount to 5 bits
is(SRL) { io.out := io.in_A >> io.in_B(4, 0) } // Limit shift amount to 5 bits
is(SRA) { io.out := (io.in_A.asSInt >> io.in_B(4, 0)) } // Signed right shift
is(X) { io.out := 0.S }
}
Putting everything together:
There are \(6\) B-type instructions in RISC-V I, which are told by opcode
, and func3
indicate which instruction it is.
instruction | funct3 (bianry) | funct3 (decimal) |
---|---|---|
beq |
000 | 0 |
bge |
101 | 5 |
bgeu |
111 | 7 |
blt |
100 | 4 |
bltu |
110 | 6 |
bne |
001 | 1 |
By examing IO ports, it will be clear how this module work:
branch
decide whether this is a B-type
instruction or not, while arg_x
, arg_y
are value coming from rs1
and rs2
, func3
indicates which B-type instruction it is, and br_taken
is the output signal set according to input.
Control signal setting:
Putting them all together:
Control signal are directly mapped from opcode
, the sheet will demonstrate the mapping relationship:
opcode | Instruction Type |
---|---|
011 0011 | R-type |
110 0011 | B-type |
001 0011 | I-type |
010 0011 | S-type |
000 0011 | L-type |
001 0111 | AUIPC |
011 0111 | LUI |
110 1111 | JAL |
110 0111 | JALR |
And the control signals can be set accordingly. We set the control signals with chisel's swtich
syntax:
Different types of instructions need different ways to concatenate the immediate number. The module outputs the immediate number based on the input instruction.
bits position | 31-25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
---|---|---|---|---|---|---|
I | imm[11:0] | rs1 | funct3 | rd | opcode |
For I-type, we extract the inst[31-20]
and sign extend the MSB.
bits position | 31-25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
---|---|---|---|---|---|---|
S | imm[11:5] | rs2 | rs1 | funct3 | imm[4:0] | opcode |
For S-type, we concatenate inst[31-25]
with inst[11-7]
and sign extend the MSB.
bits position | 31-25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
---|---|---|---|---|---|---|
B | imm[12|10:5] | rs2 | rs1 | funct3 | imm[4:1|11] | opcode |
For Branch-type, we concatenate inst[31]
,inst[7]
,inst[20-25]
,inst[11-8]
, 0 and sign extend the MSB.Furthermore, we also add the program counter for jump
bits position | 31-25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
---|---|---|---|---|---|---|
U | imm[31:12] | rd | opcode |
For U-type, we eatract inst[31-12]
and fill the rest of the bits with 0.
bits position | 31-25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
---|---|---|---|---|---|---|
UJ | imm[20|10:1|11|19:12] | rd | opcode |
For UJ-type, we concatenate inst[31]
,inst[19-12]
,inst[20]
,inst[30-21]
, 0 and sign extend the MSB. Furthermore, we also add the program counter for jump
The complete code is shown below.
This part is not directly related to the the five-stage pipelined datapath image
, but by writing this component independently, the circuit can be more understandable.
The module implements the JALR
instruction directly by adding generated imm
and value from register to compute destination address
then we align the address by bitwise operation.
The complete code is shown below.
The PC
and PC4
modules share a similar structure in terms of handling input and output, but their functionality differs slightly based on whether the input is incremented by 4.
PC:
PC4:
For the register file, we use RegInit
together with VecInit
to create 32 registers, each initialized to 0.
RegisterFile
module accepts two register addresses for reading and one address for writing.
When reading a register, if the address is 0, the output is always 0. Otherwise, it outputs the data stored at the specified address.
When writing to a register, the module first checks the write enable signal (reg_write
) and ensures the target address is not 0. If both conditions are met, the data is written to the specified register.
The complete code is shown below.
This module is a combinational logic that decide whther forwarding is needed.IF_ID_inst
, IF_ID_inst
, and ID_EX_rd
are inputs from pipeline registers, pc_in
and current_pc
are not in charge of forwarding decisions.
If a L-type instruction (i.e., io.ID_EX_memRead === 1.B
) is followed by R-type, S-type or B-type instruction, and registers overlap (i.e., ((io.ID_EX_rd === Rs1) || (io.ID_EX_rd === Rs2))
), we forward instruction, PC and certain control signals. For example:
in this senerio, we set signals inst_forward
, pc_forward
and ctrl_forward
to true
, otherwise false
.
then the data are path through the module
The implementation is shown below:
This module handles hazard involving B-type instructions, we will first explain the meaning of the IO ports, then explain the logic of the code.
Name | I/O | Meaning |
---|---|---|
ID_EX_RD |
Input | Index of the destination register for instruction in the ID/EX stage. |
EX_MEM_RD |
Input | Index of the destination register for instruction in the EX/MEM stage. |
MEM_WB_RD |
Input | Index of the destination register for instruction in the MEM/WB stage. |
ID_EX_memRd |
Input | whether the instruction in the ID/EX stage is L-type |
EX_MEM_memRd |
Input | Index of the destination register for L-type instruction in the EX/MEM stage. |
MEM_WB_memRd |
Input | Index of the destination register for L-type instruction in the MEM/WB stage. |
rs1 |
Input | rs1 register of B-type instruction |
rs2 |
Input | rs2 register of B-type instruction |
ctrl_branch |
Input | whether this instruction is B-type |
forward_rs1 |
Output | where rs1 came from |
forward_rs2 |
Output | where rs2 came from |
This module handles \(4\) types of hazard, and we will explain with example:
B-type
, ALU hazardand the circuit handling this situation:
Here io.forward_rs1 := "b0001".U
indicates that data will be forwarded from EXE/MEM
register.
B-type
, EX/MEM Hazardand the circuit handling this situation:
Here, io.forward_rs1 := "b0010".U
means data are forwarded from MEM/WB
stage.
B-type
, MEM/WB HazardB-type
, EXE/MEM Hazard part:and the code accordingly:
here, io.forward_rs1 := "b0011".U
means data will be forwarded from the WB
stage.
Different types of instruction can all cause hazard, therefore, forward_rs1
should be set correspondingly.
The following sheet demonstrate how each value of forward_rs1
means:
Value of forward_rs1 |
Type | Where data are forwarde from |
---|---|---|
0001 | ALU Hazard | ID/EX |
0010 | EX/MEM Hazard | EX/MEM |
0011 | MEM/WB Hazard | MEM/WB |
0110 | JALR | ID/EX |
0111 | JALR | EX/MEM |
1001 | JALR | EX/MEM |
1000 | JALR | MEM/WB |
1010 | JALR | MEM/WB |
The full implementation:
The module Forwarding.scala
are handling for data hazard
.
This module handles 2 types of hazard, and we will explain with example:
The EX Hazard show as below:
For the first instruction add
, the result needs to be stored in x3
. However, x3
is also required as an input for the sub instruction in the next line. The situation creates an EX Hazard.
The situation is detecting by
io.EXMEM_regWr === "b1".U
is used to detect whether the write-back signal is active, which in this case means that the result of add will be written back to x3.io.EXMEM_rd =/= "b00000".U
ensures that the target register is valid (not x0
, which always holds the value 0 in RISC-V architecture).x3
(in add) with x3
and x5
(in sub).This logic ensures that the hazard is detected, sending the signal b10
allowing the processor to forward the result of x3
directly from the EX/MEM stage
to the next instruction before it is written back to the register file.
The complete code for handling EX hazard
would be:
The MEM Hazard show as below:
For the first instruction lw
, the result needs to be stored in x3
. However, the next instruction sub
also requires the data in x3
. The situation creates a MEM Hazard.
The situation is detecting by
(io.MEMWB_regWr === "b1".U)
is used to detect whether the write-back signal is active, which in this case means that the result of lw
will be written back to x3
.io.MEMWB_rd =/= "b00000".U
ensures that the target register is valid (not x0
, which always holds the value 0 in RISC-V architecture).(io.MEMWB_rd === io.IDEX_rs1) && (io.MEMWB_rd === io.IDEX_rs2)
compare the target register to write with target register to calculate which in our case compare x3(in lw) with x3 and x5(in sub).EX Hazard
This logic ensures that the hazard is detected and sends the signal b01, allowing the processor to forward the result of x3 directly from the MEM/WB stage to the next instruction, even before it is written back to the register file.
Most of the hazards are handled before, this module may be redundent.
Once we have completed all the modules, we can integrate them into a system.
First, we instantiate every module.
Remarkably, we use dontTouch
to prevent Chisel from removing the signal during optimization. Ref : DontTouch
To fetch the instruction, we select the correct program counter (PC) based on the signal from the HazardDetect module.
PC+4 value
from the PC4 module is selected.Then, update the current program counter (PC) by PC.io.in := PC_F
, update the next PC by PC4.io.pc := PC.io.out.asUInt
and fetch the instruction by InstMemory.io.addr := PC.io.out.asUInt
.
For PC and instruction forwarding, we choose the instruction and PC using the HazardDetect module.
Finally, we pass the pc and instruction to the register (i.e., IF_ID module) where
PC.io_out
represents the current PC,PC4.io.out
represents the next PC (i.e., PC + 4),PC_for
represents the correct PC selected by the HazardDetect module.First, we pass the Selected instruction and PC into the ImmGenerator module and forward the opcode (i.e., inst[6,0]) into control module.
If the type of the instruction is R-type
(opcode = 51), I-type
(opcode = 19), S-type
(opcode = 35), I-type (load instructions)
(opcode = 3), SB-type (branch)
(opcode = 99), or JALR
(opcode = 103), we will decode the rs1
(i.e., inst[19,15]).
Same, we decode the rs2
(i.e., inst[24,20]) if the type of the instruction is R-type
(opcode = 51), S-type
(opcode = 35), SB-type (branch)
(opcode = 99).
then, control the write signal by RegFile.io.reg_write := control_module.io.reg_write
.
Finally, we generate the immediate value by ImmGenerator module.
For handling Structural Hazards, we extract the rs1
and rs2
from the IF_ID instruction
and pass them to the Structural module. This allows the module to detect whether the stage requires the data.
and receive the data for forwarding from MEM_WB register
.
then decide the whether the data need from forwarding by Structural module signal.
For detecting Hazard, we pass the data from IF_ID register
and ID_EX register
into the HazardDetect module.
then if the stall is needed (detected by HazardDetect module), we make a bubble by setting all the control signal to 0.
Before passing the data into the ID_EX register
, we will handle the branch
and jal
operations separately, and we will explain them in separate sections.
then we pass the data into ID_EX register
.
For Branch and JALR we pass the data from IF_ID
, ID_EX
, EX_MEM
, MEM_WB
into the BranchForward module to detect whether the rs1
and rs2
come from.
We utilize the BranchForward module to choose the forward rs1
and rs2
and pass the data into the Branch module to detect whether it should branch.
also, we utilize the JAL module to detect how to jump the instruction.
The JALR module should output the target address with correct alignment.
Finally, updating the PC by detecting the Hazard and Control module.
So far, we have completed the functions for IF
, ID
, and branch & jalr
. The correct register is ready for calculation, and the PC is accurate.
First, pass the correct register of rs1
, rs2
and rd
into the ALU module and also the ALU control signal.
then we have to forward the data in EX stage therefore pass the data into the Forwarding module.
and decide the value passing into ALU by the Forwarding module.
then pass the data into the EX_WB register.
The only thing we have to do in MEM stage is to pass the data into the DataMemory module.
then update the MEM_WB register
For the write back stage, we pass the write and read signal from MEM_WB register.
Finally, we pass the data and determine which value should be written back based on the control signal.
Until now, we have completed the 5-stage RISC-V pipeline. The complete code is shown below.
In this section, we test each function by using ChiselTest
which is base on ScalaTest
and by using Verilator
which is a powerful tool for simulating verilog module.
Ref : Chisel Cookbook (Migrating from ChiselTest to ChiselSim)
Chisel provide tools to convert scala code into verilog. For example:
Once the Verilog module is generated, it can be converted into a C++ object using verilator. Ex :
A testbench can then be written in C++ (Ex : DataMemory_tb.cpp) to test the module.
For the DataMemory module, we test the functionality of Write Data
and Read Data
separately.
First, initialize the Data Memory and set the address 0
to 42
.
To test the functionality of Write Data
, we set the mem_write
to true and step the clock.
To test the functionality of Read Data
, we set the mem_read
to true, step the clock and expect the output to be 42
.
Finally, we test Write Data
and Read data
together. Write the Address 1
to -15
and read it from the memory.
The whole code of DataMemoryTest.scala
would be
The code for generating verilog:
and the generated verilog code can be derived:
We wrote a testbench to test the module's functionality:
and the verification result:
For the InstMem module, we utilize the .hex
file from rv32ui-p-add.hex
we extract the first four instructions which is 0x0500006f
,0x34202f73
,0x00800f93
, and 0x03ff0863
.
By using Seq
, we can generate the test cases for the module.
then test the module by poke
and expect
Chisel uses h
as the prefix for hexadecimal values, instead of 0x
as used in C/C++.
Ref : Chisel Cookbook (Chisel Data Types)
For the testing of the piplnes we test by initialize the registers, step the clock and validate the outputs.
The algorithm of ChiselTest
remains the same; therefore, to reduce the length of the article, the complete code will be provided without additional explanation.
For the IF_ID module, we also test the initialize of the registers.
However, Chisel seems to initialize the registers to zero >automatically, so we are not sure whether the test works.
The complete code :
For the Alu Control module, we test all of the alu op
to ensure the result concate correctly.
For the ALU module, we test all operations to ensure they produce correct results.
For the Branch module, we set an function to be able to input the test case.
then test all the possible condition of branch
For the Control module, we test for each opcode
For the ImmGenerator module, we test the possible input instruction and exam the output.
For the JALR module, we test different type of input and check the result of output.
For the PC4 module, we update the program counter for 0
, 4
, 100
and the output results should be pc+4
.
For the PC module, we update the program counter for 4
, 100
, -8
to check the behavior and also check the result for program counter remain the same.
For the RegisterFile module
, we test the write and read functions, and most importantly, ensure that x0
is always zero.
For the BranchForward module, we test for ALU hazards
, EX/MEM hazards
, MEM/WB hazards
, and Jalr forwarding
.
Here, we set the rd
at ID_EX register to x5
(dut.io.ID_EX_RD.poke(5.U)
) and set both rs1
and rs2
to x5
.
Here, we set the rd
at EX_MEM register to x5
(dut.io.ID_EX_RD.poke(5.U)
) and set both rs1
and rs2
to x5
.
Here, we set the rd
at MEM_WB register to x5
(dut.io.ID_EX_RD.poke(5.U)
) and set both rs1
and rs2
to x5
.
Here, we set the ctrl_branch
to 0 which means the instruction is JALR.
The whole code would be :
For the remaining Hazard Units, we applied the same testing methodology. Therefore, only the code demonstration is provided here.
For the test of Main, we utilize riscv-test as our test bench. There are many type of riscv-test, we choose rv32ui-p-*
to test our 5 stage pipelined RISC-V cpu where rv32ui stands for 32-bit RISC-V user mode with the integer base instruction set and p stands for program.
For every RISC-V test, if the CPU passes all tests, the global pointer (gp) will be set to 1. Otherwise, it will be set to a value greater than 1.
Take rv32ui-p-add
as example
If the test fails, the PC will jump to fail; otherwise, the program will complete all the tests and pass.
In order to check the gp
we set a debug output in the RegisterFile module ( i.e., io.reg_debug1 := regfile(3)
where 3 is the gp register id ) and print it out.
For the test program, we step the clock in order to finish all the instructions.
We tested all the instructions, and the gp register consistently shows gp: 0x00000001
after running the program, indicating that the testbench has passed.
We did not pass the fence and jalr tests. The reasons will be explained below.
The fence instruction serves as a memory barrier to enforce a specific ordering of loads and stores.Since we did not implement this functionality, we do not need to test rv32ui-p-fence
.
For jalr
, we checked the .dump
file and noticed that the RISC-V test set the t1 register to 0x00000010
, but the target of test 2 is at 0x000001a4
. Therefore, we believe that the reason we did not pass the test is because of the testbench mistakenly set the destination of jalr.
We test the PIPLINE
module with the floowing cpp
code. As mention above gp
(register index 3) will be set to 1 if all testcases were passed. The following is the cpp
code:
and the execution result:
We wrote a FSM to decide whether to take the branch or not. This module will be placed in the instruction fetch stage, and will be updated when the actual result is computed.
then test it by
Prediction should be compared with actual result to determined whether instruction should be flushed or not.
To achieve the functionality in 5-stage pipelined RISC-V processor, we rewrite the BRANCH
module as well.
The whole code would be as following,
Other than the predictor, we added two more modules to compute the desired program counter. The BTB
module is for computing the predicted program counter, and the PCselector
is for deciding which program counter to take. The following will be explaination of these 2 module.
BTB
The BTB
module takes in current PC
and current instruction, then calculate target address and decide whether the instruction is B-type
.
PC selection
When predictor is introduced, how to set the new program counter become complicated. Therefore, we redesign the logic of selecting program counter.
We test our processor with branch prediction using riscv test mentioned above, which set the register gp
to \(1\) if success. The following is the testing preogram and testing result:
and the execution result:
To verify the branch prediction module, we compare the number of cycles required to execute the same program on the original processor and on the one with the prediction mechanism. The testing program is shown as follow:
and we used verilator to monitor how many cycles is required:
and the executing result:
without branch prediction
with branch prediction
We test our processor with a program that does multiplication:
Result :
with branch prediction
without branch prediction
We tried running bubble sort with our processor, the following is the testing program:
We could not pass this one, and our guess is that the memory map is problematic. Our first assumption is stack overflow, because we did not allocate enough space when designing the data memory module, the lw ra,44(sp)
instruction did not work as expected, causing problem for returning. The second assumption is segmentaion fault, because we did not specify the memory map during linking, return address is gone during sorting. We will try to fix this problem as soon as possible.
Brute Force solution
We assign the data of array with lw
instruction and set head of the array's head to 0
and the program became runnable.
with predictor
without predictor
Although this method solve the problem, it was too inefficient. We are trying to solve it with linker script.
In general, the branch predictor successfully reduces cycles.