李皓翔
when(io.jump_flag_id){
pc := io.jump_address_id
}.otherwise{
pc := pc + 4.U
}
This part determines whether the instruction is of J-type or B-type by checking jump_flag_id
. If it is, the address of the next instruction is set to jump_address_id
. Otherwise, it is set to pc + 4.U
. The corresponding instruction is then fetched from memory and passed to the next stage.
val opcode = io.instruction(6, 0)
val funct3 = io.instruction(14, 12)
val funct7 = io.instruction(31, 25)
val rd = io.instruction(11, 7)
val rs1 = io.instruction(19, 15)
val rs2 = io.instruction(24, 20)
In the Decode stage, the instruction is first decomposed into opcode
, funct3
, funct7
, rd
, rs1
, and rs2
. Based on the opcode, the type of the instruction can be identified, as shown in the table below.
opcode | Instruction Type |
---|---|
011 0011 | R-type |
110 0011 | B-type |
001 0011 | I-type |
010 0011 | S-type |
000 0011 | L-type |
001 0111 | AUIPC |
011 0111 | LUI |
110 1111 | JAL |
110 0111 | JALR |
31-25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
---|---|---|---|---|---|
funct7 | rs2 | rs1 | funct3 | rd | opcode |
31-25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
---|---|---|---|---|---|
imm[11:0] | rs1 | funct3 | rd | opcode |
31-25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
---|---|---|---|---|---|
imm[11:5] | rs2 | rs1 | funct3 | imm[4:0] | opcode |
31-25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
---|---|---|---|---|---|
imm[12|10:5] | rs2 | rs1 | funct3 | imm[4:1|11] | opcode |
31-25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
---|---|---|---|---|---|
imm[31:12] | rd | opcode |
31-25 | 24-20 | 19-15 | 14-12 | 11-7 | 6-0 |
---|---|---|---|---|---|
imm[20|10:1|11|19:12] | rd | opcode |
Depending on the type, the corresponding control signals and the value of the immediate are handled separately.
switch(io.opcode) {
is(InstructionTypes.I) {
io.alu_funct := MuxLookup(
io.funct3,
ALUFunctions.zero,
IndexedSeq(
InstructionsTypeI.addi -> ALUFunctions.add,
InstructionsTypeI.slli -> ALUFunctions.sll,
InstructionsTypeI.slti -> ALUFunctions.slt,
InstructionsTypeI.sltiu -> ALUFunctions.sltu,
InstructionsTypeI.xori -> ALUFunctions.xor,
InstructionsTypeI.ori -> ALUFunctions.or,
InstructionsTypeI.andi -> ALUFunctions.and,
InstructionsTypeI.sri -> Mux(io.funct7(5), ALUFunctions.sra, ALUFunctions.srl)
),
)
}
is(InstructionTypes.RM) {
io.alu_funct := MuxLookup(
io.funct3,
ALUFunctions.zero,
IndexedSeq(
InstructionsTypeR.add_sub -> Mux(io.funct7(5), ALUFunctions.sub, ALUFunctions.add),
InstructionsTypeR.sll -> ALUFunctions.sll,
InstructionsTypeR.slt -> ALUFunctions.slt,
InstructionsTypeR.sltu -> ALUFunctions.sltu,
InstructionsTypeR.xor -> ALUFunctions.xor,
InstructionsTypeR.or -> ALUFunctions.or,
InstructionsTypeR.and -> ALUFunctions.and,
InstructionsTypeR.sr -> Mux(io.funct7(5), ALUFunctions.sra, ALUFunctions.srl)
),
)
}
is(InstructionTypes.B) {
io.alu_funct := ALUFunctions.add
}
is(InstructionTypes.L) {
io.alu_funct := ALUFunctions.add
}
is(InstructionTypes.S) {
io.alu_funct := ALUFunctions.add
}
is(Instructions.jal) {
io.alu_funct := ALUFunctions.add
}
is(Instructions.jalr) {
io.alu_funct := ALUFunctions.add
}
is(Instructions.lui) {
io.alu_funct := ALUFunctions.add
}
is(Instructions.auipc) {
io.alu_funct := ALUFunctions.add
}
}
In the ALUControl, the opcode
is used to determine the corresponding instruction type, and each instruction is mapped to its respective alu_funct
.
lui
, auipc
, and load instructions:add
operation.
switch(io.func) {
is(ALUFunctions.add) {
io.result := io.op1 + io.op2
}
is(ALUFunctions.sub) {
io.result := io.op1 - io.op2
}
is(ALUFunctions.sll) {
io.result := io.op1 << io.op2(4, 0)
}
is(ALUFunctions.slt) {
io.result := io.op1.asSInt < io.op2.asSInt
}
is(ALUFunctions.xor) {
io.result := io.op1 ^ io.op2
}
is(ALUFunctions.or) {
io.result := io.op1 | io.op2
}
is(ALUFunctions.and) {
io.result := io.op1 & io.op2
}
is(ALUFunctions.srl) {
io.result := io.op1 >> io.op2(4, 0)
}
is(ALUFunctions.sra) {
io.result := (io.op1.asSInt >> io.op2(4, 0)).asUInt
}
is(ALUFunctions.sltu) {
io.result := io.op1 < io.op2
}
}
The ALU performs operations on the input operands op1
and op2
according to the corresponding instruction.
alu.io.op1 := Mux(io.aluop1_source === 1.U, io.instruction_address, io.reg1_data)
alu.io.op2 := Mux(io.aluop2_source === 1.U, io.immediate, io.reg2_data)
alu.io.func := alu_ctrl.io.alu_funct
io.mem_alu_result := alu.io.result
io.if_jump_flag := opcode === Instructions.jal ||
(opcode === Instructions.jalr) ||
(opcode === InstructionTypes.B) && MuxLookup(
funct3,
false.B,
IndexedSeq(
InstructionsTypeB.beq -> (io.reg1_data === io.reg2_data),
InstructionsTypeB.bne -> (io.reg1_data =/= io.reg2_data),
InstructionsTypeB.blt -> (io.reg1_data.asSInt < io.reg2_data.asSInt),
InstructionsTypeB.bge -> (io.reg1_data.asSInt >= io.reg2_data.asSInt),
InstructionsTypeB.bltu -> (io.reg1_data.asUInt < io.reg2_data.asUInt),
InstructionsTypeB.bgeu -> (io.reg1_data.asUInt >= io.reg2_data.asUInt)
)
)
io.if_jump_address := io.immediate + Mux(opcode === Instructions.jalr, io.reg1_data, io.instruction_address)
}
In the Execute
module, the ALU
and ALUControl
are instantiated. The specific ALU computation logic is handled within the ALU
module. In the Execute
module only need to assign values to the input ports of the ALU
and determine whether to perform a jump.
jal
and jalr
), the jump is executed directly.memory_read_enable
is set to 1
.memory_write_enable
is set to 1
.Based on different instructions, the corresponding read and write operations are performed.
regs_write_source
is used to determine the value to be written, which can be one of the following:alu_result
memory_read_data
instruction_address + 4.U
package riscv.core
import chisel3._
import chisel3.util.Cat
import riscv.CPUBundle
import riscv.Parameters
class CPU extends Module {
val io = IO(new CPUBundle)
val regs = Module(new RegisterFile)
val inst_fetch = Module(new InstructionFetch)
val id = Module(new InstructionDecode)
val ex = Module(new Execute)
val mem = Module(new MemoryAccess)
val wb = Module(new WriteBack)
io.deviceSelect := mem.io.memory_bundle
.address(Parameters.AddrBits - 1, Parameters.AddrBits - Parameters.SlaveDeviceCountBits)
inst_fetch.io.jump_address_id := ex.io.if_jump_address
inst_fetch.io.jump_flag_id := ex.io.if_jump_flag
inst_fetch.io.instruction_valid := io.instruction_valid
inst_fetch.io.instruction_read_data := io.instruction
io.instruction_address := inst_fetch.io.instruction_address
regs.io.write_enable := id.io.reg_write_enable
regs.io.write_address := id.io.reg_write_address
regs.io.write_data := wb.io.regs_write_data
regs.io.read_address1 := id.io.regs_reg1_read_address
regs.io.read_address2 := id.io.regs_reg2_read_address
regs.io.debug_read_address := io.debug_read_address
io.debug_read_data := regs.io.debug_read_data
id.io.instruction := inst_fetch.io.instruction
ex.io.instruction := inst_fetch.io.instruction
ex.io.instruction_address := inst_fetch.io.instruction_address
ex.io.reg1_data := regs.io.read_data1
ex.io.reg2_data := regs.io.read_data2
ex.io.immediate := id.io.ex_immediate
ex.io.aluop1_source := id.io.ex_aluop1_source
ex.io.aluop2_source := id.io.ex_aluop2_source
mem.io.alu_result := ex.io.mem_alu_result
mem.io.reg2_data := regs.io.read_data2
mem.io.memory_read_enable := id.io.memory_read_enable
mem.io.memory_write_enable := id.io.memory_write_enable
mem.io.funct3 := inst_fetch.io.instruction(14, 12)
io.memory_bundle.address := Cat(
0.U(Parameters.SlaveDeviceCountBits.W),
mem.io.memory_bundle.address(Parameters.AddrBits - 1 - Parameters.SlaveDeviceCountBits, 0)
)
io.memory_bundle.write_enable := mem.io.memory_bundle.write_enable
io.memory_bundle.write_data := mem.io.memory_bundle.write_data
io.memory_bundle.write_strobe := mem.io.memory_bundle.write_strobe
mem.io.memory_bundle.read_data := io.memory_bundle.read_data
wb.io.instruction_address := inst_fetch.io.instruction_address
wb.io.alu_result := ex.io.mem_alu_result
wb.io.memory_read_data := mem.io.wb_memory_read_data
wb.io.regs_write_source := id.io.wb_reg_write_source
}
In CPU.scala
, all components are instantiated and connected together.
Using the IF2ID
and ID2EX
pipeline registers, the pipeline is divided into three stages:
The hazards in a pipeline can be divided into data hazards and control hazards:
EX:
instruction | 1 | 2 | 3 | 4 |
---|---|---|---|---|
add x1, x2, x3 | IF | ID | EX/MEM/WB | |
sub x4, x5, x1 | IF | ID | EX/MEM/WB |
In these cases, the EX stage sends a jump signal (including jump_flag and jump_address) to the IF stage. However, before the jump_address is written to the program counter (PC), the pipeline stages IF and ID may still contain invalid instructions that have not been written to the register. To address this issue, it is necessary to flush the corresponding pipeline registers to clear these invalid instructions.
Compared to the single-cycle design, four new files have been added: PipelineRegister.scala, Control.scala, IF2ID.scala, and ID2EX.scala.
package riscv.core
import chisel3._
import riscv.Parameters
class PipelineRegister(width: Int = Parameters.DataBits, defaultValue: UInt = 0.U) extends Module {
val io = IO(new Bundle {
val stall = Input(Bool())
val flush = Input(Bool())
val in = Input(UInt(width.W))
val out = Output(UInt(width.W))
})
val myreg = RegInit(UInt(width.W), defaultValue)
val out = RegInit(UInt(width.W), defaultValue)
when(io.flush) {
out := defaultValue
myreg := defaultValue
}
.elsewhen(io.stall) {
out := myreg
}
.otherwise {
myreg := io.in
out := io.in
}
io.out := out
}
This part acts as a cache in the pipeline, with the purpose of splitting the combinational logic and, based on the input state, performing flush and stall operations or setting new values.
package riscv.core.threestage
import chisel3._
class Control extends Module {
val io = IO(new Bundle {
val JumpFlag = Input(Bool())
val Flush = Output(Bool())
})
io.Flush := io.JumpFlag
}
This part will determine when to perform a flush based on the jump flag.
///// IF2ID
package riscv.core.threestage
import chisel3._
import riscv.core.PipelineRegister
import riscv.Parameters
class IF2ID extends Module {
val io = IO(new Bundle {
val flush = Input(Bool())
val instruction = Input(UInt(Parameters.InstructionWidth))
val instruction_address = Input(UInt(Parameters.AddrWidth))
val interrupt_flag = Input(UInt(Parameters.InterruptFlagWidth))
val output_instruction = Output(UInt(Parameters.DataWidth))
val output_instruction_address = Output(UInt(Parameters.AddrWidth))
val output_interrupt_flag = Output(UInt(Parameters.InterruptFlagWidth))
})
val stall = false.B
val instruction = Module(new PipelineRegister(defaultValue = InstructionsNop.nop))
instruction.io.in := io.instruction
instruction.io.stall := stall
instruction.io.flush := io.flush
io.output_instruction := instruction.io.out
val instruction_address = Module(new PipelineRegister(defaultValue = ProgramCounter.EntryAddress))
instruction_address.io.in := io.instruction_address
instruction_address.io.stall := stall
instruction_address.io.flush := io.flush
io.output_instruction_address := instruction_address.io.out
val interrupt_flag = Module(new PipelineRegister(Parameters.InterruptFlagBits))
interrupt_flag.io.in := io.interrupt_flag
interrupt_flag.io.stall := stall
interrupt_flag.io.flush := io.flush
io.output_interrupt_flag := interrupt_flag.io.out
}
These two parts will instantiate PipelineRegister and pass the output information from the previous stage to the next stage through the PipelineRegister, while providing stall and flush functionalities.
add t0 t1 t2
or t3 t4 t5
slt t6 t0 t3
Cycle | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|
IF | add | or | slt | ||||
ID | add | or | slt | ||||
EX | add | or | slt | ||||
MEM | add | or | slt | ||||
WB | add | or | slt |
When an instruction in the ID stage needs to read a register that depends on an instruction in the EX or MEM stage, a data hazard occurs. As shown in the table above, when the instruction slt t6, t0, t3
enters the ID stage, the previous instruction add t0, t1, t2
is only in the MEM stage. Therefore, the slt instruction will encounter a data hazard issue when fetching t0.
Cycle | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|---|
IF | add | or | slt | |||||
ID | add | or | slt | slt | ||||
EX | add | or | nop | slt | ||||
MEM | add | or | nop | slt | ||||
WB | add | or | nop | slt |
By inserting nop instructions between the instructions and stalling the PC and IF2ID registers, the slt instruction can correctly read the value of t0. It is crucial to ensure that while keeping the IF and ID stages unchanged, the ID2EX register is cleared to insert a blank instruction ("bubble") in the EX stage. Otherwise, the instruction in the ID stage will continue into the EX stage.
This part of the logic is implemented in Control.scala
.The data hazard occurs if the source registers (rs1_id, rs2_id) of the instruction in the ID stage depend on the destination registers (rd_ex, rd_mem) of the instructions in the EX or MEM stages.When a data hazard is detected:
When a jump instruction (jump_flag) is detected:
Because the next two instruction should not be executed consecutively but should be cleared instead. Therefore, the IF2ID and ID2EX registers should be cleared.
package riscv.core.fivestage_stall
import chisel3._
import riscv.Parameters
class Control extends Module {
val io = IO(new Bundle {
val jump_flag = Input(Bool()) // ex.io.if_jump_flag
val rs1_id = Input(UInt(Parameters.PhysicalRegisterAddrWidth)) // id.io.regs_reg1_read_address
val rs2_id = Input(UInt(Parameters.PhysicalRegisterAddrWidth)) // id.io.regs_reg2_read_address
val rd_ex = Input(UInt(Parameters.PhysicalRegisterAddrWidth)) // id2ex.io.output_regs_write_address
val reg_write_enable_ex = Input(Bool()) // id2ex.io.output_regs_write_enable
val rd_mem = Input(UInt(Parameters.PhysicalRegisterAddrWidth)) // ex2mem.io.output_regs_write_address
val reg_write_enable_mem = Input(Bool()) // ex2mem.io.output_regs_write_enable
val if_flush = Output(Bool())
val id_flush = Output(Bool())
val pc_stall = Output(Bool())
val if_stall = Output(Bool())
})
io.if_flush := false.B
io.id_flush := false.B
io.pc_stall := false.B
io.if_stall := false.B
when(io.jump_flag) {
io.if_flush := true.B
io.id_flush := true.B
}.elsewhen(
(io.reg_write_enable_ex && (io.rd_ex === io.rs1_id || io.rd_ex === io.rs2_id) && io.rd_ex =/= 0.U)
|| (io.reg_write_enable_mem && (io.rd_mem === io.rs1_id || io.rd_mem === io.rs2_id) && io.rd_mem =/= 0.U)
) {
io.id_flush := true.B
io.pc_stall := true.B
io.if_stall := true.B
}
}
Using stalls can resolve data hazard issues; however, this approach involves a significant amount of bubbling, which reduces execution efficiency. To address this, forwarding can be used instead to transfer data to the dependent instruction, avoiding wasted clock cycles.
0000: addi x1, x0, 1
0004: sub x2, x0, x1
0008: and x2, x1, x2
000C: lw x2, 4(x2)
0010: or x3, x1, x2
clock cycle | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|---|
IF | addi | sub | and | lw | or | |||
ID | addi | sub | and | lw | or | |||
EX | addi | sub | and | lw | or | |||
EX2MEM | addi:x1 | sub:x2 | and:x2 | |||||
MEM | addi | sub | and | lw | nop | |||
MEM2WB | addi:x1 | sub:x2 | and:x2 | lw:x2 | ||||
WB | addi | sub | and | lw |
In the example above, the instruction sub x2, x0, x1
depends on the result of the previous instruction, but the result has not yet been written back to the register. Through forwarding, the result of the addi instruction can be directly passed from the EX/MEM register to the sub instruction. However, when an instruction needs data loaded from memory by the previous instruction, since the data is only available in the MEM stage, forwarding cannot immediately resolve this hazard, and a stall is still required.
Method | Description | Performance |
---|---|---|
Forwarding | Resolves data hazards by directly passing data, avoiding pipeline stalls | Improves instruction throughput but increases hardware complexity. |
Stall | Inserts a bubble to allow the pipeline to wait for data to be ready before proceeding. | Reduces performance but is simpler to implement. |
The M extension is a subset of the R-type instructions and includes the following eight instructions: remu
, rem
, divu
, div
, mulhu
, mulhsu
, mulh
, and mul
.
The main distinction between the M extension and standard R-type instructions lies in the value of the funct7 field. For the M extension, the funct7 field is always 0000001.
To handle this in the ALU control logic, the processing of R-type instructions should first differentiate instructions based on the value of funct7. Specifically:
By distinguishing between the M extension and standard R-type instructions at this stage, the ALU control logic can correctly execute the required operation based on the instruction's functionality.
is(InstructionTypes.RM) {
when(io.funct7 === "b0000001".U) {
// M Extension
io.alu_funct := MuxLookup(
io.funct3,
ALUFunctions.zero,
IndexedSeq(
InstructionsTypeM.mul -> ALUFunctions.mul,
InstructionsTypeM.mulh -> ALUFunctions.mulh,
InstructionsTypeM.mulhsu -> ALUFunctions.mulhsu,
InstructionsTypeM.mulhum -> ALUFunctions.mulhum,
InstructionsTypeM.div -> ALUFunctions.div,
InstructionsTypeM.divu -> ALUFunctions.divu,
InstructionsTypeM.rem -> ALUFunctions.rem,
InstructionsTypeM.remu -> ALUFunctions.remu
)
)
}.otherwise {
// R Type
io.alu_funct := MuxLookup(
io.funct3,
ALUFunctions.zero,
IndexedSeq(
InstructionsTypeR.add_sub -> Mux(io.funct7(5), ALUFunctions.sub, ALUFunctions.add),
InstructionsTypeR.sll -> ALUFunctions.sll,
InstructionsTypeR.slt -> ALUFunctions.slt,
InstructionsTypeR.sltu -> ALUFunctions.sltu,
InstructionsTypeR.xor -> ALUFunctions.xor,
InstructionsTypeR.or -> ALUFunctions.or,
InstructionsTypeR.and -> ALUFunctions.and,
InstructionsTypeR.sr -> Mux(io.funct7(5), ALUFunctions.sra, ALUFunctions.srl)
)
)
}
}
object ALUFunctions extends ChiselEnum {
val zero, add, sub, sll, slt, xor, or, and, srl, sra, sltu, mul, mulh, mulhsu, mulhu, div , divu, rem, remu = Value
}
First, add the definitions for the M extension instructions in the object section, in the ALU.scala
file.
is(ALUFunctions.mul) {
io.result := (io.op1 * io.op2)(31, 0)
}
is(ALUFunctions.mulh) {
io.result := (io.op1.asSInt * io.op2.asSInt >> 32).asUInt
}
is(ALUFunctions.mulhsu) {
io.result := ((io.op1.asSInt *io.op2) >> 32).asUInt
}
is(ALUFunctions.mulhu) {
io.result := ((io.op1 * io.op2 ) >> 32).asUInt
}
is(ALUFunctions.div) {
io.result := Mux(io.op2 === 0.U, "hFFFFFFFF".U, (io.op1.asSInt / io.op2.asSInt).asUInt)
}
is(ALUFunctions.divu) {
io.result := Mux(io.op2 === 0.U, "hFFFFFFFF".U, io.op1 / io.op2)
}
is(ALUFunctions.rem) {
io.result := Mux(io.op2 === 0.U, io.op1, (io.op1.asSInt % io.op2.asSInt).asUInt)
}
is(ALUFunctions.remu) {
io.result := Mux(io.op2 === 0.U, io.op1, io.op1 % io.op2)
}
In the ALU.scala
file, add the computation logic for each M extension instruction.
.text
la a0, multiplier # Load multiplier address
lw a1, 0(a0) # Load multiplier value
la a2, multiplicand # Load multiplicand address
lw a3, 0(a2) # Load multiplicand value
li t0, 0 # Initialize accumulator
li t1, 32 # Set bit counter (#A01)
# Check for negative values
bltz a1, handle_negative1 # If multiplier negative (#A02)
j shift_and_add_loop # Skip to main loop (#A05)
bltz a3, handle_negative2 # If multiplicand negative (#A03)
j shift_and_add_loop # Continue to main loop (#A04)
handle_negative1:
neg a1, a1 # Make multiplier positive
handle_negative2:
neg a3, a3 # Make multiplicand positive
shift_and_add_loop:
beqz t1, end_shift_and_add # Exit if bit count is zero
andi t2, a1, 1 # Check least significant bit (#A06)
beqz t2, skip_add # Skip add if bit is 0
add t0, t0, a3 # Add to accumulator
skip_add:
srai a1, a1, 1 # Right shift multiplier
slli a3, a3, 1 # Left shift multiplicand
addi t1, t1, -1 # Decrease bit counter
j shift_and_add_loop # Repeat loop (#A07)
end_shift_and_add:
la a4, result # Load result address
sw t0, 0(a4) # Store final result (#A08)
.text
la a0, multiplier # Load multiplier address
lw a1, 0(a0) # Load multiplier value
la a2, multiplicand # Load multiplicand address
lw a3, 0(a2) # Load multiplicand value
mul t0, a1, a3 # Perform multiplication (t0 = a1 * a3)
la a4, result # Load result address
sw t0, 0(a4) # Store final result
use mul
to simply the instruction.
.globl _start
_start:
# Set up the initial value for a0
addi a0, x0, 8 # a0 = 8
# Multiply a0 by itself (a1 = a0 * a0)
mul a1, a0, a0 # a1 = a0 * a0
# Division of a1 by a0 (a2 = a1 / a0)
div a2, a1, a0 # a2 = a1 / a0 (integer division)
# Unsigned division of a1 by a0 (a3 = a1 / a0)
divu a3, a1, a0 # a3 = a1 / a0 (unsigned division)
# Remainder of a1 divided by a0 (a4 = a1 % a0)
rem a4, a1, a0 # a4 = a1 % a0 (remainder)
# Signed remainder of a1 divided by a0 (a5 = a1 % a0)
remu a5, a1, a0 # a5 = a1 % a0 (unsigned remainder)
loop:
j loop
Test the various instructions of the M extension.
.globl _start
_start:
addi t1,x0, 0 # t1 = low = 0
addi t2, x0,100 # t2 = high = N
addi t0, x0,100
addi t5,x0,2
binary_search:
bgt t1, t2, loop
# mid = (low + high) / 2
add t3, t1, t2 # t3 = low + high
div t3, t3, t5 # t3 = mid = (low + high) / 2
mul t4, t3, t3 # t4 = mid * mid
blt t4, t0, set_low
beq t4, t0, set_result
add t2, t3, x0
addi t2, t2, -1
j binary_search
set_low:
# low = mid + 1
addi t1, t3, 1
j binary_search
set_result:
add t1, t3,x0
loop:
j loop
This part uses the binary search method to calculate the square root. By using this approach, we can find the integer closest to the square root. This method avoids the complications of floating-point calculations, making it a more straightforward way to compute the square root.
riscv32-unknown-elf-as -o test.o test.s
riscv32-unknown-elf-ld -o test.elf -T link.lds test.o
riscv32-unknown-elf-objcopy -O binary test.elf test.asmbin
The commands use the riscv32-unknown-elf toolchain to convert a .s file into a .asmbin file.
it should "test multiplication" in {
test(new TestTopModule("test.asmbin", ImplementationType.ThreeStage)).withAnnotations(TestAnnotations.annos) {
c =>
c.clock.step(1000)
c.io.regs_debug_read_address.poke(5.U)
c.io.regs_debug_read_data.expect(BigInt("4294967233").U)
var realData = c.io.regs_debug_read_data.peek().litValue
var signedData = if (realData >= (1L << 31)) {
realData - (1L << 32)
} else {
realData
}
println(s"[Check a0] real = $signedData")
c.clock.step(1)
}
}
it should "test multiplication, division, unsigned division, and remainders with an infinite loop" in {
test(new TestTopModule("test2.asmbin", ImplementationType.ThreeStage)).withAnnotations(TestAnnotations.annos) {
c =>
// Set the simulation clock steps
c.clock.step(1000)
// Verify register a0 (should be 8)
c.io.regs_debug_read_address.poke(10.U) // a0 corresponds to register number 10
c.io.regs_debug_read_data.expect(8.U)
c.clock.step()
// Verify register a1 (should be 8 * 8 = 64)
c.io.regs_debug_read_address.poke(11.U) // a1 corresponds to register number 11
c.io.regs_debug_read_data.expect(64.U)
c.clock.step()
// Verify register a2 (should be a1 / a0 = 64 / 8 = 8)
c.io.regs_debug_read_address.poke(12.U) // a2 corresponds to register number 12
c.io.regs_debug_read_data.expect(8.U)
c.clock.step()
// Verify register a3 (should be a1 / a0 using unsigned division = 64 / 8 = 8)
c.io.regs_debug_read_address.poke(13.U) // a3 corresponds to register number 13
c.io.regs_debug_read_data.expect(8.U)
c.clock.step()
// Verify register a4 (should be a1 % a0 = 64 % 8 = 0)
c.io.regs_debug_read_address.poke(14.U) // a4 corresponds to register number 14
c.io.regs_debug_read_data.expect(0.U)
c.clock.step()
// Verify register a5 (should be a1 % a0 using unsigned remainder = 64 % 8 = 0)
c.io.regs_debug_read_address.poke(15.U) // a5 corresponds to register number 15
c.io.regs_debug_read_data.expect(0.U)
// Check that the simulator enters an infinite loop
c.clock.step(1) // Advance one more clock cycle to confirm system stability
}
}
it should "correctly calculate the integer square root of 100" in {
test(new TestTopModule("test3.asmbin", ImplementationType.ThreeStage)).withAnnotations(TestAnnotations.annos) { c =>
for (_ <- 0 until 1000) {
c.clock.step()
}
c.io.regs_debug_read_address.poke(6.U)
c.clock.step()
c.io.regs_debug_read_data.expect(10.U)
}
}
it should "test multiplication" in {
test(new TestTopModule("test.asmbin", ImplementationType.FiveStageFinal)).withAnnotations(TestAnnotations.annos) {
c =>
c.clock.step(1000)
c.io.regs_debug_read_address.poke(5.U)
c.io.regs_debug_read_data.expect(BigInt("4294967233").U)
var realData = c.io.regs_debug_read_data.peek().litValue
var signedData = if (realData >= (1L << 31)) {
realData - (1L << 32)
} else {
realData
}
println(s"[Check a0] real = $signedData")
c.clock.step(1)
}
}
it should "test multiplication, division, unsigned division, and remainders with an infinite loop" in {
test(new TestTopModule("test2.asmbin", ImplementationType.FiveStageFinal)).withAnnotations(TestAnnotations.annos) {
c =>
// Set the simulation clock steps
c.clock.step(1000)
// Verify register a0 (should be 8)
c.io.regs_debug_read_address.poke(10.U) // a0 corresponds to register number 10
c.io.regs_debug_read_data.expect(8.U)
c.clock.step()
// Verify register a1 (should be 8 * 8 = 64)
c.io.regs_debug_read_address.poke(11.U) // a1 corresponds to register number 11
c.io.regs_debug_read_data.expect(64.U)
c.clock.step()
// Verify register a2 (should be a1 / a0 = 64 / 8 = 8)
c.io.regs_debug_read_address.poke(12.U) // a2 corresponds to register number 12
c.io.regs_debug_read_data.expect(8.U)
c.clock.step()
// Verify register a3 (should be a1 / a0 using unsigned division = 64 / 8 = 8)
c.io.regs_debug_read_address.poke(13.U) // a3 corresponds to register number 13
c.io.regs_debug_read_data.expect(8.U)
c.clock.step()
// Verify register a4 (should be a1 % a0 = 64 % 8 = 0)
c.io.regs_debug_read_address.poke(14.U) // a4 corresponds to register number 14
c.io.regs_debug_read_data.expect(0.U)
c.clock.step()
// Verify register a5 (should be a1 % a0 using unsigned remainder = 64 % 8 = 0)
c.io.regs_debug_read_address.poke(15.U) // a5 corresponds to register number 15
c.io.regs_debug_read_data.expect(0.U)
// Check that the simulator enters an infinite loop
c.clock.step(1) // Advance one more clock cycle to confirm system stability
}
}
it should "correctly calculate the integer square root of 100" in {
test(new TestTopModule("test3.asmbin", ImplementationType.FiveStageFinal)).withAnnotations(TestAnnotations.annos) { c =>
for (_ <- 0 until 1000) {
c.clock.step()
}
c.io.regs_debug_read_address.poke(6.U)
c.clock.step()
c.io.regs_debug_read_data.expect(10.U)
}
}
Refer to the test files in the reference documents to complete the test programs for the three-stage and five-stage implementations.
The recommended installation environment from the official website requires Python 3.6. To avoid conflicts with the local environment, I used conda to create a virtual environment:
$ source ~/miniconda3/bin/activate
$ conda create -n RISCOF python=3.6
$ conda activate RISCOF
$ conda deactivate
$ conda remove --name RISCOF --all
$ pip3 install git+https://github.com/riscv/riscof.git
$ cd riscv-ctg
$ pip3 install --editable .
$ cd riscv-isac
$ pip3 install --editable .
The above are the installation commands provided in the GitHub repository. However, when following the commands to download the resources from GitHub, I noticed that the riscv-ctg and riscv-isac directories were not present.
I then checked the official RISCOF website for proper installation instructions and found that simply running the following command would suffice pip install riscof
To confirm the installation, ran the command riscof --help
If the installation was successful, the following message was displayed:
Usage: riscof [OPTIONS] COMMAND [ARGS]...
Options:
--version Show the version and exit.
-v, --verbose [info|error|debug]
Set verbose level
--help Show this message and exit.
Commands:
arch-test Setup and maintenance for Architectural TestSuite.
coverage Run the tests on DUT and reference and compare signatures
gendb Generate Database for the Suite.
run Run the tests on DUT and reference and compare signatures
setup Initiate Setup for riscof.
testlist Generate the test list for the given DUT and suite.
validateyaml Validate the Input YAMLs using riscv-config.
The following installation steps are provided on the official website:
$ sudo apt-get install autoconf automake autotools-dev curl python3 libmpc-dev \
libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool \
patchutils bc zlib1g-dev libexpat-dev
$ git clone --recursive https://github.com/riscv/riscv-gnu-toolchain
$ git clone --recursive https://github.com/riscv/riscv-opcodes.git
$ cd riscv-gnu-toolchain
$ ./configure --prefix=/path/to/install --with-arch=rv32gc --with-abi=ilp32d # for 32-bit toolchain
$ [sudo] make # sudo is required depending on the path chosen in the previous setup
However, during execution, I encountered an error when running the command git clone --recursive https://github.com/riscv/riscv-gnu-toolchain
To resolve this issue, I referred to the riscv-gnu-toolchain GitHub repository for a solution. Instead of using the –recursive option, I simply used git clone https://github.com/riscv/riscv-gnu-toolchain
After making this adjustment, I followed the rest of the steps as described above.
Finally, to verify whether the installation was successful, I ran: riscv32-unknown-elf-gcc --version
.It will show
lhh@lhh-OptiPlex-Tower-Plus-7020:~$ riscv32-unknown-elf-gcc --version
riscv32-unknown-elf-gcc () 14.2.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ sudo apt-get install device-tree-compiler
$ git clone https://github.com/riscv-software-src/riscv-isa-sim.git
$ cd riscv-isa-sim
$ mkdir build
$ cd build
$ ../configure --prefix=/path/to/install
$ make
$ [sudo] make install
$ sudo apt-get install libgmp-dev pkg-config zlib1g-dev curl
$ curl --location https://github.com/rems-project/sail/releases/download/0.18-linux-binary/sail.tar.gz | [sudo] tar xvz --directory=/path/to/install --strip-components=1
$ git clone https://github.com/riscv/sail-riscv.git
$ cd sail-riscv
$ ARCH=RV32 make
$ ARCH=RV64 make
riscof setup --refname=sail_cSim --dutname=my_dut
Using the above command will generate a folder structure as follow
├──config.ini # configuration file for riscof
├──my_dut/ # DUT plugin templates
├── env
│ ├── link.ld # DUT linker script
│ └── model_test.h # DUT specific header file
├── riscof_my_dut.py # DUT python plugin
├── my_dut_isa.yaml # DUT ISA yaml based on riscv-config
└── my_dut_platform.yaml # DUT Platform yaml based on riscv-config
├──sail_cSim/ # reference plugin templates
├── env
│ ├── link.ld # Reference linker script
│ └── model_test.h # Reference model specific header file
├── __init__.py
└── riscof_sail_cSim.py # Reference model python plugin.
Below is the changes made to the riscof_my_dut.py
file. First, I modified the ELF file path, replacing the hardcoded value with a dynamic path (output.elf) generated within the test_dir. Additionally, I added a step to generate a binary file (asmbin) by using the riscv32-unknown-elf-objcopy tool to convert the ELF file into a binary format. The test execution command was also updated by replacing the original simcmd with a new sbt-based command that takes the ELF file and signature file paths as arguments. Finally, I adjusted the execute command by including the objcopy_cmd step and specifying the working directory path (/home/lhh/computer_arch/final_lab/riscv-core) before running the simcmd.
def runTests(self, testList):
# Delete Makefile if it already exists.
if os.path.exists(self.work_dir+ "/Makefile." + self.name[:-1]):
os.remove(self.work_dir+ "/Makefile." + self.name[:-1])
# create an instance the makeUtil class that we will use to create targets.
make = utils.makeUtil(makefilePath=os.path.join(self.work_dir, "Makefile." + self.name[:-1]))
# set the make command that will be used. The num_jobs parameter was set in the __init__
# function earlier
make.makeCommand = 'make -k -j' + self.num_jobs
# we will iterate over each entry in the testList. Each entry node will be refered to by the
# variable testname.
for testname in testList:
# for each testname we get all its fields (as described by the testList format)
testentry = testList[testname]
# we capture the path to the assembly file of this test
test = testentry['test_path']
# capture the directory where the artifacts of this test will be dumped/created. RISCOF is
# going to look into this directory for the signature files
test_dir = testentry['work_dir']
# name of the elf file after compilation of the test
- # elf = 'my.elf'
+ elf = os.path.join(test_dir, 'output.elf')
# name of the signature file as per requirement of RISCOF. RISCOF expects the signature to
# be named as DUT-<dut-name>.signature. The below variable creates an absolute path of
# signature file.
sig_file = os.path.join(test_dir, self.name[:-1] + ".signature")
# for each test there are specific compile macros that need to be enabled. The macros in
# the testList node only contain the macros/values. For the gcc toolchain we need to
# prefix with "-D". The following does precisely that.
compile_macros= ' -D' + " -D".join(testentry['macros'])
# substitute all variables in the compile command that we created in the initialize
# function
cmd = self.compile_cmd.format(testentry['isa'].lower(), self.xlen, test, elf, compile_macros)
+ asmbin = os.path.join(test_dir, 'output.asmbin')
+ objcopy_cmd = f"riscv32-unknown-elf-objcopy -O binary {elf} {asmbin}"
# if the user wants to disable running the tests and only compile the tests, then
# the "else" clause is executed below assigning the sim command to simple no action
# echo statement.
if self.target_run:
# set up the simulation command. Template is for spike. Please change.
+ simcmd= f'sbt -DelfFile={elf} -DsignatureFile={sig_file} "testOnly riscv.mycputest"'
- simcmd = self.dut_exe + ' --isa={0} +signature={1} +signature-granularity=4 {2}'.format(self.isa, sig_file, elf)
else:
simcmd = 'echo "NO RUN"'
# concatenate all commands that need to be executed within a make-target.
- execute = '@cd {0}; {1}; {2};'.format(testentry['work_dir'], cmd, simcmd)
+ execute = '@cd {0}; {1};{2}; cd {3}; {4};'.format(testentry['work_dir'], cmd, objcopy_cmd,"/home/lhh/computer_arch/final_lab/riscv-core ", simcmd)
# create a target. The makeutil will create a target with the name "TARGET<num>" where num
# starts from 0 and increments automatically for each new target that is added
make.add_target(execute)
# if you would like to exit the framework once the makefile generation is complete uncomment the
# following line. Note this will prevent any signature checking or report generation.
#raise SystemExit
# once the make-targets are done and the makefile has been created, run all the targets in
# parallel using the make command set above.
make.execute_all(self.work_dir)
# if target runs are not required then we simply exit as this point after running all
# the makefile targets.
if not self.target_run:
raise SystemExit(0)
First uses the ELF file that has been read to determine the memory range of the signature file through the following program.
def extractSymbolAddress(elfFile: String, objdumpPath: String, symbolName: String): BigInt = {
val symbolsCmd = s"$objdumpPath -t $elfFile"
val symbolsOutput = symbolsCmd.!!
val symbolLine = symbolsOutput
.split("\n")
.find(_.contains(s" $symbolName"))
.getOrElse(throw new RuntimeException(s"Symbol $symbolName not found in $elfFile."))
BigInt(symbolLine.split("\\s+")(0), 16)
}
Next, the information in the memory range that was read is extracted and output as a signature file. Initially, I used the following program to directly read the data from the corresponding memory location, but the values read were all zeros.
val signatureData = (0 until signatureWords.toInt).map { i =>
val address = beginSignature + (i * 4)
c.io.mem_debug_read_address.poke(address.U)
c.clock.step()
val data = c.io.mem_debug_read_data.peek().litValue
writer.println(f"$data%08x")
}
Next, I discussed this issue with my classmates and examined the disass
file in the reference materials. I decided to output the values in memory from address 0 to 30000. Afterward, I discovered that the memory region from 0 to 4096 was empty.That's because in Parameters.scala, it defines the memory entry address as EntryAddress = 0x1000.U(Parameters.AddrWidth). Therefore, I modified the program to the following form.
(0 until signatureWords.toInt).map { i =>
val address = beginSignature + (i * 4) + 4096
c.io.mem_debug_read_address.poke(address.U)
c.clock.step()
val data = c.io.mem_debug_read_data.peek().litValue
writer.printf("%08x\n", data.toLong)
}
writer.close()
riscof run --config=config.ini --suite=riscv-arch-test/riscv-test-suite/rv32i_m/M --env=riscv-arch-test/riscv-test-suite/env
The above command is used to test the m extension data from the riscv-arch-test suite on a custom CPU. The final output will be displayed in a web-based format as shown below.
Fix the permissions of the uploaded pictures.
The above shows the test results for the three-stage pipeline, while the following displays the test results for the five-stage pipeline.Both pipelines successfully passed all the tests.