contributed by < paulpeng-popo
>
In order to avoid affecting the original computer environment, a container is set up to provide the experimental environment for Assignment 3.
Here, I use Docker for building container.
FROM arm64v8/ubuntu:22.04
# set the working directory
WORKDIR /root
# set the environment variable
ENV DEBIAN_FRONTEND=noninteractive
# update the repository sources list
RUN apt update
# install sudo
RUN apt install sudo -y
# create a new user as popo
RUN useradd -ms /bin/bash popo
# add the user to sudo group
RUN usermod -aG sudo popo
# set user popo as sudoer without password
RUN echo "popo ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
# change the user to popo
USER popo
# change the working directory to home
WORKDIR /home/popo
# set the environment variable
ENV HOME /home/popo
ENV USER popo
ENV PATH $PATH:/home/popo/.local/bin
# install packages
RUN sudo apt install git wget curl xauth dbus-x11 -y
ENTRYPOINT ["/bin/bash"]
Then, following the instructions provided in Lab3: Construct a single-cycle RISC-V CPU with Chisel to install necessary dependency packages and tools.
$ sudo apt install build-essential verilator gtkwave
$ curl -s "https://get.sdkman.io" | bash
$ sdk install java 11.0.21-tem
$ sdk install sbt
# install scala on aarch64 linux
$ curl -fL https://github.com/VirtusLab/coursier-m1/releases/latest/download/cs-aarch64-pc-linux.gz | gzip -d > cs && chmod +x cs && ./cs setup
# change to scala 2
$ cs install scala:2.13.12 scalac:2.13.12
// Hello.scala
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := cntReg + 1.U
when(cntReg === CNT_MAX) {
cntReg := 0.U
blkReg := ~blkReg
}
io.led := blkReg
}
The module has two registers, cntReg
and blkReg
, both initialized with zero values. cntReg
is a 32-bit counter that increments by 1 in each clock cycle. When cntReg
reaches a certain value (CNT_MAX), it resets to zero, and blkReg
toggles its value.
I test Hello.scala
on chisel-template
For testing convenience, I have reduced the number from 50,000,000 to 10, making it easier to observe the differences in the output.
The updated CNT_MAX
value is now set to 4
Then create HelloSpec.scala
in scr/test/scala/example
// HelloSpec.scala
class HelloSpec extends AnyFreeSpec with ChiselScalatestTester {
"Hello" in {
test(new Hello) { hello =>
for (clk <- 0 until 10) {
hello.clock.step(1)
val led = hello.io.led.peek()
println(s"clk: $clk, led: $led")
}
}
}
}
It checks whether the module correctly simulates by stepping the simulation forward for 10 clock cycles and printing the values of the clk
and led
signals at each step.
$ sbt "testOnly example.HelloSpec"
The output would look like:
clk: 0, led: UInt<1>(0)
clk: 1, led: UInt<1>(0)
clk: 2, led: UInt<1>(0)
clk: 3, led: UInt<1>(0)
clk: 4, led: UInt<1>(1)
clk: 5, led: UInt<1>(1)
clk: 6, led: UInt<1>(1)
clk: 7, led: UInt<1>(1)
clk: 8, led: UInt<1>(1)
clk: 9, led: UInt<1>(0)
Using
when
blocks in Hardware Description Language (HDL) designs is not necessarily something that should be avoided in all cases. However, in some situations, particularly when dealing with simple state machines or conditional assignments, it might be more readable and synthesizable to use multiplexers (muxes) instead ofwhen
blocks.The primary reason for preferring
muxes
in some cases is that they directly map to hardware multiplexing structures, which synthesis tools can often recognize and implement more efficiently. This is especially true for simple conditions or state machines where amux
can directly represent the selection of one value from several inputs.โ- From ChatGPT โ-
- cntReg := cntReg + 1.U
- when(cntReg === CNT_MAX) {
- cntReg := 0.U
- blkReg := ~blkReg
- }
+ cntReg := Mux(cntReg === CNT_MAX, 0.U, cntReg + 1.U)
+ blkReg := blkReg ^ (cntReg === CNT_MAX)
Here, a multiplexer is employed to determine whether the counter should increment or reset to zero. Simultaneously, a logical XOR
operation is utilized to toggle the state of blkReg. The XOR
operation ensures that blkReg changes its state whenever the counter is reset, providing the desired functionality without using a when
block.
$ WRITE_VCD=1 sbt test
$ gtkwave test_run_dir/<xxx>/<xxx>.vcd
Instruction fetch stage does:
The PC register is initially set to the entry address of the program. Upon encountering a valid instruction, the CPU fetches the instruction located at the address specified by the PC. If a jump is necessary, the CPU checks the jump_flag_id
to determine whether a jump should be taken. If a jump is required, the PC is then updated with the address specified by jump_address_id
. Otherwise, the PC is incremented by 4 to move to the next sequential instruction.
PC initiates at address 0x1000
. In the first test case, where no jump occurs, the PC advances to fetch the next instruction by incrementing to PC + 4. Subsequently, in the second test case, a jump to address 0x1000
is executed, causing the PC to update its value to 0x1000
during the next clock cycle.
Decode stage does:
add
, read two registersaddi
, read one registerjal
, no reads are necessaryAt this stage, 8 signals need to be generated, and the remaining two outputs, namely memory_read_enable
and memory_write_enable
, have not been implemented yet.
These two signals appear to be associated with load and store instructions.
To finalize their implementation, we can easily configure memory_read_enable
to be true.B
when processing L type instructions, and set memory_write_enable
to true.B
for S type instructions; otherwise, the default value remains false.B
.
A warning occurs during compilation:
method apply in object MuxLookup is deprecated (since Chisel 3.6): Use MuxLookup(key, default)(mapping) instead
To address this warning, simply relocate the mapping sequence section to eliminate the deprecation message.
val immediate = MuxLookup(
opcode,
Cat(..., ...)
) {
IndexedSeq(
...,
...
)
}
Three test cases:
object InstructionTypes {
val L = "b0000011".U // 0x3
val I = "b0010011".U
val S = "b0100011".U // 0x23
val RM = "b0110011".U
val B = "b1100011".U
}
According to our design specification, when the opcode is 0x3
, the signal memory_read_enable
should be set to true.B
, and when the opcode is 0x23
, the signal memory_write_enable
should be set to true.B
. The waveform chart above conveniently validates this behavior.
Execution stage does:
The control line for the ALU, denoted as alu.io.func
, is derived from the output of the ALU control module, specifically alu_ctrl.io.alu_funct
. Additionally, the two inputs of the ALU are determined by the control lines aluop1_source
and aluop2_source
. These control lines drive the corresponding inputs through two Muxes.
Initially, there are some test cases that involve the ADD instruction, aiming to evaluate the normal functioning of the ALU. The final two tests involve the BEQ instruction, assessing both jump and non-jump scenarios. In the case where the jump is taken, the program counter advances to PC + 2, equivalent to 0x4
.
With the completion of modules for each stage, the subsequent phase involves connecting the inputs and outputs of these stages. Once this integration is accomplished, the single-cycle RISC-V CPU will be considered complete.
[info] Run completed in 10 seconds, 938 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 12 s, completed Dec 3, 2023, 12:05:01 AM