contributed by < GliAmanti
>
My OS: Ubuntu 22.04 LTS
sudo apt install build-essential verilator gtkwave
curl -s "https://get.sdkman.io" | bash
source "/home/cgvsl/.sdkman/bin/sdkman-init.sh"
sdk install java 11.0.21-tem
sdk install sbt
Remember to repeat step 2 every time you open a new terminal to run sbt.
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := cntReg + 1.U
when(cntReg === CNT_MAX) {
cntReg := 0.U
blkReg := ~blkReg
}
io.led := blkReg
}
led
is the output of Hello
class.CNT_MAX
is the maximum of the counter value.cntReg
is a counter.blkReg
is the current state of the LED.cntReg
will increase by 1 gradually. When cntReg
equals CNT_MAX
, namely 24999999
, it will be reset to 0
. And blkReg
will toggle its state. The state of the blkReg
will be assigned to output led
.
We only have to fill the blanks in InstructionFetch.scala
, InstructionDecode.scala
, Execute.scala
and CPU.scala
.
Here is my inplementation of lab3, which is forked from ca2023-lab3.
The following figure is RV32I datapath.
git clone https://github.com/GliAmanti/ComputerArchitecture_HW3.git
cd ComputerArchitecture_HW3
ComputerArchitecture_HW3
directory.
sbt test
The output message will be:
[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project computerarchitecture_hw3-build from plugins.sbt ...
[info] loading project definition from /home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/)
[info] compiling 1 Scala source to /home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/target/scala-2.13/test-classes ...
[info] ByteAccessTest:
[info] Single Cycle CPU
[info] - should store and load a single byte
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
[info] FibonacciTest:
[info] Single Cycle CPU
[info] - should recursively calculate Fibonacci(10)
[info] QuicksortTest:
[info] Single Cycle CPU
[info] - should perform a quicksort on 10 numbers
[info] RegisterFileTest:
[info] Register File of Single Cycle CPU
[info] - should read the written content
[info] - should x0 always be zero
[info] - should read the writing content
[info] Run completed in 6 seconds, 952 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 10 s, completed Nov 30, 2023, 1:24:28 PM
InstructionDecoderTest
, execute the following command:
sbt "testOnly riscv.singlecycle.InstructionDecoderTest"
The output message will be:
[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project computerarchitecture_hw3-build from plugins.sbt ...
[info] loading project definition from /home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/)
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] Run completed in 2 seconds, 509 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 3 s, completed Nov 30, 2023, 1:29:16 PM
If you see the icon
This test verifies whether the InstructionFetch
module bring PC
to the right address.
There are 2 cases:
jump_flag_id
will be set to false
, then PC := PC + 4
.jump_flag_id
is 0, so instruction_address
change from 1000
to 1004
.jump_flag_id
will be set to true
, then PC := jump_address_id
.PC
will jump to entry
, namely 0x1000
.jump_flag_id
is 1, so instruction_address
change from 1008
to 1000
.This test verifies whether the InstructionDecoder
module accurately distinguishs the opcode, passing the right data for each corresponding instruction.
There are many cases but this test only verifies 3:
This kind of instruction adds the contents of rs1
and simm12
, regarding the result as memory address, so:
aluop1_source := ALUOp1Source.Register
. That is, choosing gate 0
.
aluop2_source := ALUOp2Source.Immediate
. That is, choosing gate 1
.
According to the definition in InstructionDecoder
.
object ALUOp1Source {
val Register = 0.U(1.W)
val InstructionAddress = 1.U(1.W)
}
object ALUOp2Source {
val Register = 0.U(1.W)
val Immediate = 1.U(1.W)
}
sw/sh/sb rs2, rs1, simm12
00A02223
as S-type by the last 7 bits in binary 010 0011
. So memory_write_enable
will be set to 1
. reg1_read_address = 0
and immediate = 4
will be transported to the next stage.
This instruction loads uimm20
to upper 20 bits of rd
, and sets the rest of bits to 0, so:
aluop1_source := ALUOp1Source.Register
. That is, choosing gate 0
.
aluop2_source := ALUOp2Source.Immediate
. That is, choosing gate 1
.
lui rd, uimm20
This instruction adds the contents of rs1
and rs2
, so:
aluop1_source := ALUOp1Source.Register
. That is, choosing gate 0
.
aluop2_source := ALUOp2Source.Register
. That is, choosing gate 0
.
add rd, rs1, rs2
This test verifies whether the Execute
module makes right decision regarding branch instruction.
There are many cases but this test only verifies add
and beq
.
ALU
adds op1
and op2
, according to the funct
.op1
is set to reg1_data
, op2
is set to reg2_data
. Both of them are random integers.jump_flag_id
will be set to false
.Branch Comp.
compares the contents of rs1
and rs2
.
ALU
computes jump address, so:
aluop1_source := ALUOp1Source.InstructionAddress
. That is, choosing gate 1
.
aluop2_source := ALUOp2Source.Immediate
. That is, choosing gate 1
.
beq rs1, rs2, simm13
reg1_data == reg2_data
, jump_flag_id
will be set to true
.jump_address := immediate + instruction_address
.jump_address := 2 + 2
reg1_data = 9
and reg2_data = 9
, the equal condition is satisfied. So jump_flag
is trigger.reg1_data != reg2_data
, jump_flag_id
will be set to false
.jump_address := immediate + instruction_address
.jump_address := 2 + 2
reg1_data = 17FE18F6
and reg2_data = 15D9D5DD
, the equal condition is not satisfied. So jump_flag
is not trigger.This test verifies whether all components in MyCPU function properly.
There are 3 test cases in CPUTest:
fibonacci.asmbin
, which calculates Fibonacci(10).mem_debug_read_address
will be set to 4
, then mem_debug_read_data
will get 55
.quicksort.asmbin
, which performs a Quick Sort on 10 numbers.mem_debug_read_address
will be set to 4 * i
, then mem_debug_read_data
will get i - 1
.sb.asmbin
, which stores and loads a single byte.regs_debug_read_address
will be set to 5
, then regs_debug_read_data
will get 0xdeadbeef
.regs_debug_read_address
will be set to 6
, then regs_debug_read_data
will get 0xef
.regs_debug_read_address
will be set to 1
, then regs_debug_read_data
will get 0x15ef
.withclock()
will be implicit reset, so they won't show the waveform in gtkwave, such as reg_debug_address
and reg_debug_data
.
Here is my adaptation of hw2.
To run hw2 with MyCPU, I do the following operations.
Remove the code related to rdcycle
and rdcycleh
in myHammingDist.S
.
Store the results to the memory address I test in CPUTest.scala
.
# base memory address to store the result
li s9, 0x4
......
# store the result for hw3
slli t1, s1, 2
add t0, s9, t1
sw a0, 0(t0)
Add the corresponding test to CPUTest.scala
.
class HammingTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "caculate the hamming distance between two 64-bit integers" in {
test(new TestTopModule("myHammingDist.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 50) {
c.clock.step(1000)
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
c.io.mem_debug_read_address.poke(4.U)
c.clock.step()
c.io.mem_debug_read_data.expect(21.U)
c.io.mem_debug_read_address.poke(8.U)
c.clock.step()
c.io.mem_debug_read_data.expect(63.U)
c.io.mem_debug_read_address.poke(12.U)
c.clock.step()
c.io.mem_debug_read_data.expect(0.U)
}
}
}
If you test the result with memory address, remember to check whether c.clock.step(1000)
is in your loop. Otherwise, you will get the error message.
Add myHammingDist.S
to csrc
directory.
Do some modification to the Makefile
, which is in csrc
directory.
CROSS_COMPILE ?= riscv-none-elf-
ASFLAGS = -march=rv32i_zicsr -mabi=ilp32
CFLAGS = -O0 -Wall -march=rv32i_zicsr -mabi=ilp32
LDFLAGS = --oformat=elf32-littleriscv
AS := $(CROSS_COMPILE)as
CC := $(CROSS_COMPILE)gcc
LD := $(CROSS_COMPILE)ld
OBJCOPY := $(CROSS_COMPILE)objcopy
%.o: %.S
$(AS) -R $(ASFLAGS) -o $@ $<
%.elf: %.S
$(AS) -R $(ASFLAGS) -o $(@:.elf=.o) $<
$(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o)
%.elf: %.c init.o
$(CC) $(CFLAGS) -c -o $(@:.elf=.o) $<
$(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o) init.o
%.asmbin: %.elf
$(OBJCOPY) -O binary -j .text -j .data $< $@
BINS = \
fibonacci.asmbin \
hello.asmbin \
mmio.asmbin \
quicksort.asmbin \
sb.asmbin \
+ myHammingDist.asmbin
# Clear the .DEFAULT_GOAL special variable, so that the following turns
# to the first target after .DEFAULT_GOAL is not set.
.DEFAULT_GOAL :=
all: $(BINS)
update: $(BINS)
cp -f $(BINS) ../src/main/resources
clean:
$(RM) *.o *.elf *.asmbin
Generate myHammingDist.asmbin
.
make update
or
make clean
make
Run the test.
sbt "testOnly riscv.singlecycle.HammingTest"
After the first run and every time you modify the Chisel code, you need to execute the following command in the project’s root directory to generate Verilog files.
make verilator
Load myHammingDist.asmbin
for simulating 2000 cycles, saving the simulation waveform to the myHammingDist.vcd
.
./run-verilator.sh -instruction src/main/resources/myHammingDist.asmbin -time 1000 -vcd myHammingDist.vcd
The output message will be:
-time 1000
-memory 1048576
-instruction src/main/resources/myHammingDist.asmbin
[-------------------->] 100%
Use GTKWave to view the output waveform file myHammingDist.vcd
gtkwave myHammingDist.vcd
I take the instruction lw a0, 0(s2)
for example and analyze how MyCPU operates the instruction in different stages.
00092503
is lw a0, 0(s2)
. (Line 52 in my code.)SB-type
nor UJ-type
instruction, so:
jump_flag_id = 0
,PC = PC + 4
.lw
will load data from the memory address rs1 + imm
to rd
, so:
memory_read_enable = 1
,memory_write_enable = 0
reg_write_address = A
(a0
= x10
)reg_write_enable = 1
rs1 = 12
(s2
= x18
)rd = A
immediate = 0
ALUControl
and ALU
, alu_funct = ALUFunctions.add
, namely 1
.
is(InstructionTypes.L) {
io.alu_funct := ALUFunctions.add
}
object ALUFunctions extends ChiselEnum {
val zero, add, sub, sll, slt, xor, or, and, srl, sra, sltu = Value
}
aluop1_source = 0
, so op1 = reg1_data
.aluop1_source = 1
, so op2 = immediate
.ALU
adds op1 = FFFFFFF4
and op2 = 00000000
, getting the result = FFFFFFF4
.memory_read_enable = 1
, so Memory
will output read_data = 00000000
, which is corresponding to the memory address = FFFFFFF4
.WriteBack
and InstructionDecode
, regs_write_source = 1
, so regs_write_data = memory_read_data
, namely 00000000
.
io.regs_write_data := MuxLookup(
io.regs_write_source,
io.alu_result,
IndexedSeq(
RegWriteSource.Memory -> io.memory_read_data,
RegWriteSource.NextInstructionAddress -> (io.instruction_address + 4.U)
)
)
object RegWriteSource {
val ALUResult = 0.U(2.W)
val Memory = 1.U(2.W)
val NextInstructionAddress = 3.U(2.W)
}
write_address = A
.