contribute by < JinYu1225 >
The objective of this assignment is to develop a single-cycle RISC-V CPU, named MyCPU, using Chisel, a hardware description language based on Scala. The process involves forking and modifying code from ca2023-lab3 to complete the CPU construction. The final step is to execute the project developed in Assignment2 on the MyCPU.
The complete project can be accessed here
For further details on Chisel, Scala, and CPU construction as part of this assignment, refer to Lab3: Construct a single-cycle RISC-V CPU with Chisel.
Hello
is a module that includes a slot named io
, used to define an unnamed bundle with an output wire named led
. Additionally, Hello
features a counter called cntReg
and a flag called blkReg
, both of which connect to the output led
. The flag blkReg
is triggered when the counter cntReg
reaches the value of CNT_MAX
.
The project comprises six tests designed for various purposes, and it is crucial to successfully pass all of them to ensure the proper functioning of MyCPU, particularly for basic functionalities. The key .scala files that require modification are InstructionFetch.scala
,InstructionDecode.scala
,Execute.scala
, and CPU.scala
.
Run all 6 tests:
Result:
It is evident that five tests have failed. One test has passed due to the modifications made in InstructionFetch.scala
, which will be elaborated on in the following discussion.
The function of instruction fetch stage is implement in InstructionFetch.scala
.
MyCPU fetches instructions from the address specified by the Program Counter (PC) and determines the next PC. We can easily implement this by categorizing the situation into two parts: whether the code jumps or not.
:warning: Refrain from copying and pasting your solution directly into the HackMD note. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases.
Another important point is that we should always check if instruciton input is valid.
0x00000013
is define as nop
in InstructionDecode.scala
Instruction Decode is implemented in InstructionDecode.scala
. The purpose of the decode stage is to assist the CPU in recognizing each type of operation and determining the corresponding action to be taken. It involves the following steps.
Let's use the ADD instruction as an example, whose opcode is represented as 0b0110011
. A more comprehensive understanding can be gained by referring to the following table.
First, the instruction undergoes processing in the IF stage as the input. It is then divided into six parts: opcode
, funct3
, funct7
, rd
, rs1
, rs2
as the figure shows above.
To ensure the correct operation of MyCPU, it is necessary to modify the control signal of MemRW in the Decode stage. This modification should be based on the types of operations being performed.
:warning: Refrain from copying and pasting your solution directly into the HackMD note. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases.
Another important point is that we should always check if instruciton input is valid.
The Execute stage is where MyCPU carries out the arithmetic processes of each instruction using the ALU. In this stage, the control signals and arguments are obtained from the ID stage, and a specific operation is performed in the ALU. Then the result will be passed to the next stage.
ALU.scala
and ALUControl.scala
are utilized in the Execute stage. ALUControl produces an output specifying the type of operation that the ALU needs to perform. The inputs of the ALU are determined by the results of ALUControl and the previous stage, as illustrated by the inputs provided through ALUop1 and ALUop2.
The CPU assumes a crucial role in coordinating the connection between each stage. Each stage is declared as a module variable within the CPU. The code establishes the necessary connections of inputs and outputs among these stages in the CPU to ensure the proper functioning of the single-cycle CPU.
The connections of inputs and outputs can be viewed by following figure.
Generate waveform files during tests:
Waveform file .vcd
will be generated under test_run_dir
directory.
The reset
signal is typically set to HIGH
to ensure hardware resets to its initial state at the beginning of a functional test. Additionally, io_instruction_valid
is set to LOW
to output the No Operation (NOP) instruction 0x00000013
to io_instruction
. The necessary input for the test will be set during this period.
Subsequently, during the falling edge of the clock, reset
and io_instruction_valid
are seperately set to LOW
and High
in preparation for the input during the subsequent rising edge.
pc
is set to io_jump_address_id
= 0x1000
as the signal io_jump_flag_id
is HIGH
in the rising edge shows above.
pc
is set to the value of pc+4
when io_jump_flag_id
remain LOW
.
The first io_struction
is 0x00A02223
= 0b0000 0000 1010 0000 0010 0010 0010 0011
. The corresponding opcode
should be 0b010 0011
which means a S-type instruction.
io_memory_write_enable
is set to HIGH
. And reg_write_enable
, memory_read_enable
is set to LOW
.io_ex_aluop1_source
= 0
= Register, and io_ex_aluop2_source
= 1
= Immediate.0b010
indicates it is sw
, and imm = 0b0 0100
, rs1 = 0b0 0000
, rs2 = 0b0 1010
io_regs_reg1_read_address
= 0
, io_regs_reg2_read_address
= 0xA
, io_ex_immediate
= 0x4
We can make sure the decoder stage work properly by checking the instruction input and the corresponding outputs.
There is an interesting point in this test that the output signals trigger during the falling edge of the clock while the IF stage trigger at the rising edge.
This could cause by the structure of single-cycle CPU. There isn't any reg to store the termianl signals execpt for the IF stage. Therefore, rising edge trigger will only appears when we test IF stage. Otherwise, the output will just change right after we give an input to the module.
In this test, the circuit will be test by the RISCV code represents x3 = x2 + x1
for 100 times. Then have a few tests for the function of pc + 2 if x1 === x2
x3 = x2 + x1
:
io_if_jump_flag
remains 0
for the add
function.0x0a45c5af
and 1486d599
, then calculate the output 1ECC9B48
.pc + 2 if x1 === x2
:
io_if_jump_flag
= 0
because io_reg1_data
does not equal to io_reg2_data
io_if_jump
= 1
, and io_if_jump_address
equals to io_instruction_address
+ 2
read the writing content:
write_enable
= 1
, signal io_read_data1
will change to 0xDEADBEEF
based on the value of the signals io_read_address1
, io_write_address
, and io_write_data
during the next rising edge.read the written content:
io_read_data1
doesn't change during this period.c.clock.step()
to the test banch.io_read_data1
will change to the corresponding result after 2 clock cycles.HammingDistanceTest
into CPUTest.scala
/csrc
to generate HammingDistance.asmbin
The test initially failed due to an oversight in the code. Upon reviewing the code, I discovered that in the HW2
version, there was a termination triggered by ecall function placed in the middle of the code. When transitioning to myCPU
, this termination point was not present, causing the program to continue execution. Consequently, the final result deviated from expectations. The issue was rectified by adjusting the termination point to the end of the code, then normal results were achieved.
Use Verilator to check the waveform and quickly test the programs. The following code should be executed everytime the source Chisel file has been modified to generate corresponding Verilog file.
Then we can get a executable file VTop
. This executable file can run the code files with following parameters.
Parameter | Usage |
---|---|
-memory |
Specify the size of the simulation memory in words (4 bytes each). Example: -memory 4096 |
-instruction |
Specify the RISC-V program used to initialize the simulation memory. Example: `-instruction src/main/resources/hello |
.asmbin` | |
-signature |
Specify the memory range and destination file to output after simulation. Example: -signature 0x100 0x200 mem.txt |
-halt |
Specify the halt identifier address; writing 0xBABECAFE to this memory address stops the simulation.Example: -halt 0x8000 |
-vcd |
Specify the filename for saving the simulation waveform during the process; not specifying this parameter will not generate a waveform file. Example: -vcd dump.vcd |
-time |
Specify the maximum simulation time; note that time is twice the number of cycles. Example: -time 1000 |
Load the HammingDistance.asmbin
, simulate for 2000 cycles, and save the simulation waveform to the dumpH.vcd
.
Take the following instruction as I-type example.
0xFF410113
= 0b 1111 1111 0100 0001 0000 0001 0001 0011
By the Reference Data Card, we can know the code represents following instruction.
addi sp sp -12
imm[11:0] | rs1 | funct3 | rd | Opcode |
---|---|---|---|---|
1111 1111 0100 | 0 0010 | 000 | 0 0010 | 001 0011 |
IF:
io_instruction_address
0x1000
will be assigned to the io_instruction_read_data
, and was soon assigned to the io_instruction
when the signal io_instruction_valid
turns to HIGH.io_jump_flag_id
is LOW, so the next pc
will be pc+4
.ID:
io_ex_aluop1_source
, io_ex_aluop2_source
were assigned as 0
and 1
, which represent Register
and Immediate
.io_ex_immediate
= 0x1...10100
, which means -12
in decminal.0
for I-type. Concurrently, the write enable signal for the register was assigned to the 1
since the result should be passed back to the register rd
.EXE:
0
since the opcode
wasn't one of the JAL, JALR, or B type instruction.op1
and op2
of alu
were assigned as 0
from sp
and -12
from imm
according to the aluop1_source
and aluop2_source
sp
and -12
was been perform by the ALU with the corresponding opcode
and funct3
which represent addi
.MEM:
WB:
reg_write_data
in WB stage will be decided by the instruction type decoded in ID stage. The RM, I, lui, auipc types instructions will take alu_result
as the source data.memory_read_data
as data source, and the JAL, JALR types take instruction_address + 4
.Take the following instruction as B type example.
0x04040A63
= 0b0000 0100 0000 0100 0000 1010 0110 0011
beq s0 x0 EXIT_HAMDIS
imm[12,10:5] | rs2 | rs1 | funct3 | imm[4:1,11] | opcode |
---|---|---|---|---|---|
000 0010 | 00000 | 01000 | 000 | 10100 | 110 0011 |
ID:
jump_flag
is assigned to HIGH, and the pc
will be jump_address_id
in the next cycle.ID:
beq
instruction from the opcode
and funct3
.imm
for B-type and assigned it to ex_immediate
.aluop1_source
and aluop2_source
were assigned as register
and imm
because whether the branch valid or not in this CPU would be determined in the jump judge unit. Normally, jump address would calculated by the ALU, but we complete it in EXE stage without ALU module in this CPU.EXE:
if_jump_address
was calculated from instruction_address
+ immediate
, and if_jump_flag
was determined by the opcode
and funct3
.if_jump_address
outside the ALU
module in the module code, we can still see that the alu_io_result
calculate the same value as the if_jump_address
. Therefore, we can use ALU to calculate B type instruction instead of doing it with extra execution.There was nothing to do in MEM and WB stage in B-type code.
Take the following instruciton as JAL example.
0x010000EF
= 0b 0000 0001 0000 0000 0000 0000 1110 1111
jal ra HAMDIS
imm[20,10:1,11,19:12] | rd | opcode |
---|---|---|
0000 0001 0000 0000 0000 | 0 0001 | 110 1111 |
IF:
jump_flag
was set to HIGH, and the correspond jump_address
was 0x104C
. we can see the value of pc
was 0x104C
in the next cycle.ID:
reg_write_enable
was set to HIGH to store pc + 4
into rd
, which was assigned to reg_write_address
.aluop1
and aluop2
were assigned as 1
that represented Instruction address
and Imm
.EXE:
if_jump_flag
was set to HIGH as the instruction was JAL.if_jump_address
was assigned as immediate
+ instruction_address
= 0x103C + 0x10
= 0x104C
.WB:
regs_write_data
was assigned as instruction_address + 4
= 0x1040
.We take the following S-type instruction as example in this section.
0x00512023
= 0b 0000 0000 0101 0001 0010 0000 0010 0011
which also means sw t0 0(sp)
imm[11:5] | rs2 | rs1 | funct3 | imm[4:0] | opcode |
---|---|---|---|---|---|
0000 000 | 00101 | 00010 | 010 | 0 0000 | 010 0011 |
Because of the result in IF stage will only different when the instruction is JAL, JALR, and B type, we skip IF stage analysis for S type instructions.
ID
t0
was assigned to reg2_read_address
and would be used in the MEM stage since the control signal memory_wirte_enable
was HIHG.MEM:
memory_write_enable
was HIGH.funct3
reveal that the instruction was sw
, the strobe
from 0 to 3 were all set to 1 corresponding to the size of a word.memory_bundle_address
= 0xFFFFFFF4
was the same as the address of sp
that we had moved at the previous instruction addi sp sp -12
.The EXE and WB stages wouldn't do any effort to S-type instruction, so we didn't discuss about it in this section.