賴傑南
In Chisel-based RISC-V processor designs, RV32E and RV32IM are two distinct RISC-V architecture variants. The primary differences lie in the number of registers and the supported instruction sets.
Feature | RV32E | RV32IM |
---|---|---|
Number of Registers | 16 (x0 - x15) | 32 (x0 - x31) |
Bit Width | 32-bit | 32-bit |
Instruction Set | RV32I | RV32I + M Extension |
Target Applications | Ultra-low power embedded systems | General embedded and compute-intensive |
Computation Ability | Basic operations | Supports multiplication/division |
Hardware Resources | Minimal | More |
Power Consumption | Lower | Higher |
Source Code <rave32> : This is an unpipelined RV32E (RISC-V 32-bit, embedded variant) CPU written in Chisel.
I forked the original Repositories and pushed the program I modified to RV32IM into a new branch called <RV32IM>.
MUL
, MULH
, MULHSU
, MULHU
, DIV
, DIVU
, REM
, REMU
)MUL
, MULH
, MULHSU
, MULHU
, DIV
, DIVU
, REM
, REMU
).MUL
, MULH
, MULHSU
, MULHU
, DIV
, DIVU
, REM
, and REMU
operation codes.rs1
, rs2
, and rd
has been increased from 3.W
to 5.W
.AluOp
and InstFormat
.The memory architecture and access methods are the same for both RV32E and RV32IM.
Instructions like LW
(Load Word), SW
(Store Word), as well as LB
, LH
, etc., are supported in both RV32E and RV32IM.
M extension (multiplication and division) instructions do not involve memory operations.
Memory.scala
.Memory.scala
is responsible for data access rather than arithmetic operations.
The memory module only cares about:
addr
)writeData
, readData
)MemOp
).Multiplication and division instructions executed by the ALU do not require additional memory access, hence no changes are needed for Memory.scala
.
OneCycle.scala
mainly handles:
imem
)decoder
)alu
)dmem
)RegFile
).Added M extended ALU test program
MUL
Test Instruction-10
to 9
.MULH
Test Instruction-10
to 9
.MULHSU
Test Instruction-10
to 9
.MULHU
Test Instruction-10
to 9
.DIV
Test Instruction-10
to 9
.DIVU
Test Instruction0
to 19
.REM
Test Instruction-10
to 9
.Test high register (x31), Write to and Read from Higher Registers
x31
) in the register file.writeEnable
to true
.42
to register x31
(rd
set to 31
).rs1
to 31
to read the value of x31
.42
).Remove or comment out the original "not decode MUL"
Added M extended test case to decoder. The following tests were added to verify the decoding functionality of the Decoder
module. Each test ensures that the corresponding R-type instruction is decoded correctly, with the proper ALU operation and register fields.
MUL
Decode TestDecoder
correctly decodes the MUL
instruction.mul x1, x2, x3
AluOp.MUL
x1
x2
, x3
MULH
Decode TestMULH
instruction.mulh x1, x2, x3
AluOp.MULH
x1
x2
, x3
MULHSU
Decode TestMULHSU
instruction.mulhsu x1, x2, x3
AluOp.MULHSU
x1
x2
, x3
MULHU
Decode TestMULHU
instruction.mulhu x1, x2, x3
AluOp.MULHU
x1
x2
, x3
DIV
Decode TestDIV
instruction.div x1, x2, x3
AluOp.DIV
x1
x2
, x3
DIVU
Decode TestDIVU
instruction.divu x1, x2, x3
AluOp.DIVU
x1
x2
, x3
REM
Decode Testrem x1, x2, x3
AluOp.REM
x1
x2
, x3
REMU
Decode Testremu x1, x2, x3
AluOp.REMU
x1
x2
, x3
Because the RISCVAssembler
compiler cannot compile M-extended instructions, so need to add M-extended instructions to assemble
Because Memory.scala has not been modified, MemorySpec.scala does not need to be modified.
Add M-extended instructions and Convert RISC-V assembly language instructions into corresponding machine code
Added test program for M extension
MUL
Functionality Testmul
instruction correctly multiplies two registers and stores the result in the destination register.mul x1, x2, x3
x2
is set to 2
.x3
is set to 3
.x1 = x2 * x3 = 6
x1
should be 6
.MULH
Functionality Testmulh
instruction correctly multiplies two registers and stores the high 32 bits of the result in the destination register.mulh x1, x2, x3
x2
is set to 1
and then shifted left by 16 bits (x2 = 65536
).x3
is set to 1
and then shifted left by 16 bits (x3 = 65536
).x1 = (65536 * 65536) >> 32 = 1
x1
should be 1
.MULHSU
Functionality Testmulhsu
instruction correctly multiplies two registers (with one operand signed and the other unsigned) and stores the high 32 bits of the result in the destination register.mulhsu x1, x2, x3
x2
is set to 1
and then shifted left by 16 bits (x2 = 65536
).x3
is set to 1
and then shifted left by 16 bits (x3 = 65536
).x1 = (65536 * 65536) >> 32 = 1
x1
should be 1
.MULHU
Functionality Testmulhu
instruction correctly multiplies two unsigned registers and stores the high 32 bits of the result in the destination register.mulhu x1, x2, x3
x2
is set to 1
and then shifted left by 16 bits (x2 = 65536
).x3
is set to 1
and then shifted left by 16 bits (x3 = 65536
).x1 = (65536 * 65536) >> 32 = 1
x1
should be 1
.DIV
Functionality Testdiv
instruction correctly divides two signed registers and stores the quotient in the destination register.div x1, x2, x3
x2
is set to 10
.x3
is set to 3
.x1 = x2 / x3 = 3
x1
should be 3
.DIVU
Functionality Testdivu
instruction correctly multiplies two registers and stores the result in the destination register.divu x1, x2, x3
x2
is set to 10
.x3
is set to 3
.x1 = x2 / x3 = 3
x1
should be 3
.REM
Functionality Testrem
instruction correctly multiplies two registers and stores the result in the destination register.rem x1, x2, x3
x2
is set to 10
.x3
is set to 3
.x1 = x2 % x3 = 1
x1
should be 1
.REMU
Functionality Testremu
instruction correctly multiplies two registers and stores the result in the destination register.remu x1, x2, x3
x2
is set to 10
.x3
is set to 3
.x1 = x2 % x3 = 1
x1
should be 1
.Tests can be executed using sbt test
to run all tests or with sbt "testOnly mrv.DecorderSpec"
to run a specific test. Various test cases have been used to validate the functionality, and the results are displayed in the Test Results.
After upgrading the RISC-V CPU from RV32E
to RV32IM
, I will transition it from a single-cycle design to a 3-stage pipeline. The traditional 5-stage pipeline consists of Instruction Fetch, Instruction Decode, Execution, Memory Access, and Write-Back stages. In my 3-stage pipeline, the CPU is reorganized into the following three stages:
To enable the CPU to function in a pipelined architecture, temporary registers need to be placed between the different stages. These registers hold the data necessary for the next stage as well as the results produced by the preceding one. Proper handling of these stage registers and guaranteeing the accurate transfer of information is essential for the pipeline to operate correctly.
Pipeline registers are used to hold and pass the necessary data and control signals from one stage of the pipeline to the next.
Used to transfer data between the instruction fetch
stage and the instruction decode
stage of the processor
Used to transfer data between the excution
stage of the processor
This register is used to handle data hazard issues by transferring the results of the ALU calculations back before writing them back.
stall
or forward
data to avoid incorrect results.ex_mem
's rd
matches decoder
's rs1
or rs2
and regWrite
is true.stall
signal to pause the PC and halt the pipeline progression.stall
signal is asserted (stall := true.B
).rd
(destination register) from a later pipeline stage matches rs1
or rs2
of the current instruction:
regWrite = true.B
) and its rd
matches the current rs1
or rs2
, use the aluOut
from the EX/MEM stage as the forwarded value.regWrite = true.B
) and its rd
matches the current rs1
or rs2
, use the wbData
from the MEM/WB stage.rs1Data
and rs2Data
by dynamically selecting the most recent value available in the pipeline.add x5, x1, x2
(produces result in x5
).sub x6, x5, x3
(requires the result from x5
).
aluOut
from Instruction 1 directly, avoiding a stall.I converted the original OneCycle program into a ThreeStage program, and added simple stalling and forwarding to avoid Data Hazard. My 3 stage pipeline program is ThreeStage.scala and is paired with a test program ThreeStageSpec.scala.
These two tests ensure that the simulator can handle RAW hazards correctly and maintain the correct execution order, and check whether the simulator handles the dependencies between registers correctly.
The other focuses on memory operations, checking whether the emulator correctly supports data loading and storing instructions.
RAW Hazard Test
addi x1, x0, 10
- Write 10 to register x1
.addi x2, x1, 5
- Add 5 to the value in x1
and write the result to x2
.x1
should be 10.x2
should be 15.Memory Load/Store Test
load (lw)
and store (sw)
operations.addi x1, x0, 5
- Write 5 to register x1
.sw x1, 100(x0)`` - Store the value of
x1` (5) into memory at address 100.lw x2, 100(x0)
- Load the value from memory at address 100 into x2
.x1
should be 5.x2
should be 5.Tests can be executed using sbt test
to run all tests or with sbt "testOnly mrv.ThreeStageSpec"
to run a specific test. Various test cases have been used to validate the functionality, and the results are displayed in the Test Results.
This function, fabsf
, is a custom implementation of the standard C function fabsf, which calculates the absolute value of a floating-point number.
Absolute value Test (fabsf)
For the operation |x|
where the input ( x = -5 ) (testing the computation of absolute value via subtraction):
Instructions:
addi x1, x0, -5
- Load the value (-5) into register x1
.sub x1, x0, x1
- Subtract x1
from x0
to compute its absolute value (resulting in ( 5 )).add x2, x1, x0
- Copy the result from x1
to x2
(to verify the result).Execution Steps:
x1
and x2
.Expected Results:
addi x1, x0, -5
), x1
should hold the value (-5).sub x1, x0, x1
), x1
should hold the value ( 5 ) (absolute value of the initial value).add x2, x1, x0
), x2
should hold the value ( 5 ), confirming that the absolute value computation was correct.Verification:
x1
should be ( 5 ) after all instructions complete.x2
should also be ( 5 ), as it mirrors the value of x1
.Fix the permissions of the uploaded pictures.
The formula defines a recursive calculation for a value 𝑛𝑚 based on the input values 𝑛 and 𝑚. The calculation varies depending on the properties of 𝑛, as follows:
If ( n ) is even:
The value of ( nm ) is calculated as
This means ( n ) is halved, and the result is multiplied by ( 2m ).
If ( n ) is odd:
The value of ( nm ) is calculated as
Here, ( n ) is reduced by 1 to make it even, halved, and multiplied by ( 2m ). Then, ( m ) is added to the result.
If ( n = 1 ):
The value of ( nm ) is simply ( m ).
This serves as the base case for the recursion or iterative process.
Multiplication Test (n*m via mul)
For every combination of ( n ) and ( m ) where ( n, m \in {1, 2, \ldots, 7} ) (49 total combinations):
addi x2, x0, n
- Load the value ( n ) into register x2
.addi x3, x0, m
- Load the value ( m ) into register x3
.mul x1, x2, x3
- Multiply the values in x2
and x3
and store the result in x1
.addi x2, x0, n
), the simulator waits for the instruction to complete (3 clock cycles total).addi x3, x0, m
), the simulator again waits for the instruction to complete (3 clock cycles total).mul x1, x2, x3
), the simulator waits for the multiplication to complete (3 clock cycles total).x2
should equal ( n ).x3
should equal ( m ).x1
should equal ( n * m ).This code snippet calculates the position of the most significant bit (MSB) in a non-negative integer N
.
Logint test
assmebly code
Execution Steps:
x2
is set to 16 (the initial value of 𝑁).x3
is initialized to 0 (counter for iterations).x2
by 1 bit (dividing it by 2).x3
to count the number of shifts performed.x4
is calculated as x3-1
to account for zero-based indexing of the MSB position.Expected Results:
x4
should hold the value 4, indicating that the MSB of 16(binary 10000) is at position 4.Verification:
x4
is updated correctly after all instructions are executed.x4
matches the calculated MSB position.Do refer to the lecture materials and/or primary source.