李皓翔
It can divide the execution of instructions into five distinct stages:
Tasks of the Instruction Fetch Stage:
This part determines whether the instruction is of J-type or B-type by checking jump_flag_id
. If it is, the address of the next instruction is set to jump_address_id
. Otherwise, it is set to pc + 4.U
. The corresponding instruction is then fetched from memory and passed to the next stage.
Tasks of the Instruction Decode Stage:
In the Decode stage, the instruction is first decomposed into opcode
, funct3
, funct7
, rd
, rs1
, and rs2
. Based on the opcode, the type of the instruction can be identified, as shown in the table below.
opcode | Instruction Type |
---|---|
011 0011 | R-type |
110 0011 | B-type |
001 0011 | I-type |
010 0011 | S-type |
000 0011 | L-type |
001 0111 | AUIPC |
011 0111 | LUI |
110 1111 | JAL |
110 0111 | JALR |
31-25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
---|---|---|---|---|---|
funct7 | rs2 | rs1 | funct3 | rd | opcode |
31-25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
---|---|---|---|---|---|
imm[11:0] | rs1 | funct3 | rd | opcode |
31-25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
---|---|---|---|---|---|
imm[11:5] | rs2 | rs1 | funct3 | imm[4:0] | opcode |
31-25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
---|---|---|---|---|---|
imm[12|10:5] | rs2 | rs1 | funct3 | imm[4:1|11] | opcode |
31-25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
---|---|---|---|---|---|
imm[31:12] | rd | opcode |
31-25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
---|---|---|---|---|---|
imm[20|10:1|11|19:12] | rd | opcode |
Depending on the type, the corresponding control signals and the value of the immediate are handled separately.
Tasks of the Execute Stage:
In the ALUControl, the opcode
is used to determine the corresponding instruction type, and each instruction is mapped to its respective alu_funct
.
lui
, auipc
, and load instructions:add
operation.The ALU performs operations on the input operands op1
and op2
according to the corresponding instruction.
In the Execute
module, the ALU
and ALUControl
are instantiated. The specific ALU computation logic is handled within the ALU
module. In the Execute
module only need to assign values to the input ports of the ALU
and determine whether to perform a jump.
jal
and jalr
), the jump is executed directly.memory_read_enable
is set to 1
.memory_write_enable
is set to 1
.Based on different instructions, the corresponding read and write operations are performed.
In the writeback stage, regs_write_source
is used to determine the value to be written, which can be one of the following:
alu_result
memory_read_data
instruction_address + 4.U
In CPU.scala
, all components are instantiated and connected together.
Using the IF2ID
and ID2EX
pipeline registers, the pipeline is divided into three stages:
The hazards in a pipeline can be divided into data hazards and control hazards:
EX:
instruction | 1 | 2 | 3 | 4 |
---|---|---|---|---|
add x1, x2, x3 | IF | ID | EX/MEM/WB | |
sub x4, x5, x1 | IF | ID | EX/MEM/WB |
In these cases, the EX stage sends a jump signal (including jump_flag and jump_address) to the IF stage. However, before the jump_address is written to the program counter (PC), the pipeline stages IF and ID may still contain invalid instructions that have not been written to the register. To address this issue, it is necessary to flush the corresponding pipeline registers to clear these invalid instructions.
Compared to the single-cycle design, four new files have been added: PipelineRegister.scala, Control.scala, IF2ID.scala, and ID2EX.scala.
This part acts as a cache in the pipeline, with the purpose of splitting the combinational logic and, based on the input state, performing flush and stall operations or setting new values.
This part will determine when to perform a flush based on the jump flag.
These two parts will instantiate PipelineRegister and pass the output information from the previous stage to the next stage through the PipelineRegister, while providing stall and flush functionalities.
Cycle | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|
IF | add | or | slt | ||||
ID | add | or | slt | ||||
EX | add | or | slt | ||||
MEM | add | or | slt | ||||
WB | add | or | slt |
When an instruction in the ID stage needs to read a register that depends on an instruction in the EX or MEM stage, a data hazard occurs. As shown in the table above, when the instruction slt t6, t0, t3
enters the ID stage, the previous instruction add t0, t1, t2
is only in the MEM stage. Therefore, the slt instruction will encounter a data hazard issue when fetching t0.
Cycle | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|---|
IF | add | or | slt | |||||
ID | add | or | slt | slt | ||||
EX | add | or | nop | slt | ||||
MEM | add | or | nop | slt | ||||
WB | add | or | nop | slt |
By inserting nop instructions between the instructions and stalling the PC and IF2ID registers, the slt instruction can correctly read the value of t0. It is crucial to ensure that while keeping the IF and ID stages unchanged, the ID2EX register is cleared to insert a blank instruction ("bubble") in the EX stage. Otherwise, the instruction in the ID stage will continue into the EX stage.
This part of the logic is implemented in Control.scala
.The data hazard occurs if the source registers (rs1_id, rs2_id) of the instruction in the ID stage depend on the destination registers (rd_ex, rd_mem) of the instructions in the EX or MEM stages.When a data hazard is detected:
When a jump instruction (jump_flag) is detected:
Because the next two instruction should not be executed consecutively but should be cleared instead. Therefore, the IF2ID and ID2EX registers should be cleared.
Using stalls can resolve data hazard issues; however, this approach involves a significant amount of bubbling, which reduces execution efficiency. To address this, forwarding can be used instead to transfer data to the dependent instruction, avoiding wasted clock cycles.
clock cycle | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|---|
IF | addi | sub | and | lw | or | |||
ID | addi | sub | and | lw | or | |||
EX | addi | sub | and | lw | or | |||
EX2MEM | addi:x1 | sub:x2 | and:x2 | |||||
MEM | addi | sub | and | lw | nop | |||
MEM2WB | addi:x1 | sub:x2 | and:x2 | lw:x2 | ||||
WB | addi | sub | and | lw |
In the example above, the instruction sub x2, x0, x1
depends on the result of the previous instruction, but the result has not yet been written back to the register. Through forwarding, the result of the addi instruction can be directly passed from the EX/MEM register to the sub instruction. However, when an instruction needs data loaded from memory by the previous instruction, since the data is only available in the MEM stage, forwarding cannot immediately resolve this hazard, and a stall is still required.
Method | Description | Performance |
---|---|---|
Forwarding | Resolves data hazards by directly passing data, avoiding pipeline stalls | Improves instruction throughput but increases hardware complexity. |
Stall | Inserts a bubble to allow the pipeline to wait for data to be ready before proceeding. | Reduces performance but is simpler to implement. |
The M extension is a subset of the R-type instructions and includes the following eight instructions: remu
, rem
, divu
, div
, mulhu
, mulhsu
, mulh
, and mul
.
The main distinction between the M extension and standard R-type instructions lies in the value of the funct7 field. For the M extension, the funct7 field is always 0000001.
To handle this in the ALU control logic, the processing of R-type instructions should first differentiate instructions based on the value of funct7. Specifically:
By distinguishing between the M extension and standard R-type instructions at this stage, the ALU control logic can correctly execute the required operation based on the instruction's functionality.
First, add the definitions for the M extension instructions in the object section, in the ALU.scala
file.
In the ALU.scala
file, add the computation logic for each M extension instruction.
use mul
to simply the instruction.
Test the various instructions of the M extension.
This part uses the binary search method to calculate the square root. By using this approach, we can find the integer closest to the square root. This method avoids the complications of floating-point calculations, making it a more straightforward way to compute the square root.
The commands use the riscv32-unknown-elf toolchain to convert a .s file into a .asmbin file.
Refer to the test files in the reference documents to complete the test programs for the three-stage and five-stage implementations.
The recommended installation environment from the official website requires Python 3.6. To avoid conflicts with the local environment, I used conda to create a virtual environment:
The above are the installation commands provided in the GitHub repository. However, when following the commands to download the resources from GitHub, I noticed that the riscv-ctg and riscv-isac directories were not present.
I then checked the official RISCOF website for proper installation instructions and found that simply running the following command would suffice pip install riscof
To confirm the installation, ran the command riscof --help
If the installation was successful, the following message was displayed:
The following installation steps are provided on the official website:
However, during execution, I encountered an error when running the command git clone --recursive https://github.com/riscv/riscv-gnu-toolchain
To resolve this issue, I referred to the riscv-gnu-toolchain GitHub repository for a solution. Instead of using the –recursive option, I simply used git clone https://github.com/riscv/riscv-gnu-toolchain
After making this adjustment, I followed the rest of the steps as described above.
Finally, to verify whether the installation was successful, I ran: riscv32-unknown-elf-gcc --version
.It will show
Using the above command will generate a folder structure as follow
Below is the changes made to the riscof_my_dut.py
file. First, I modified the ELF file path, replacing the hardcoded value with a dynamic path (output.elf) generated within the test_dir. Additionally, I added a step to generate a binary file (asmbin) by using the riscv32-unknown-elf-objcopy tool to convert the ELF file into a binary format. The test execution command was also updated by replacing the original simcmd with a new sbt-based command that takes the ELF file and signature file paths as arguments. Finally, I adjusted the execute command by including the objcopy_cmd step and specifying the working directory path (/home/lhh/computer_arch/final_lab/riscv-core) before running the simcmd.
First uses the ELF file that has been read to determine the memory range of the signature file through the following program.
Next, the information in the memory range that was read is extracted and output as a signature file. Initially, I used the following program to directly read the data from the corresponding memory location, but the values read were all zeros.
Next, I discussed this issue with my classmates and examined the disass
file in the reference materials. I decided to output the values in memory from address 0 to 30000. Afterward, I discovered that the memory region from 0 to 4096 was empty.That's because in Parameters.scala, it defines the memory entry address as EntryAddress = 0x1000.U(Parameters.AddrWidth). Therefore, I modified the program to the following form.
The above command is used to test the m extension data from the riscv-arch-test suite on a custom CPU. The final output will be displayed in a web-based format as shown below.
Fix the permissions of the uploaded pictures.
The above shows the test results for the three-stage pipeline, while the following displays the test results for the five-stage pipeline.Both pipelines successfully passed all the tests.