Contirbuted by <bclegend
>
This is my repository with completed MyCPU.
sbt test
[info] welcome to sbt 1.9.7 (Oracle Corporation Java 17.0.6)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /home/player1/Desktop/ca2023-lab3/projec
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/player1/Desktop/ca2023-lab3/)
[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] QuicksortTest:
[info] Single Cycle CPU
[info] - should perform a quicksort on 10 numbers
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] ByteAccessTest:
[info] Single Cycle CPU
[info] - should store and load a single byte
[info] FibonacciTest:
[info] Single Cycle CPU
[info] - should recursively calculate Fibonacci(10)
[info] RegisterFileTest:
[info] Register File of Single Cycle CPU
[info] - should read the written content
[info] - should x0 always be zero
[info] - should read the writing content
[info] Run completed in 9 seconds, 291 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 10 s, completed Dec 1, 2023, 6:41:54 PM
Test InstructionFetch_of_Single_Cycle_CPU_should_fetch_instruction
The test will check the io_jump_flag
will or will not chang between 1
and 0
when the jump has happened.
clock
signal is 1 we can see the instruction_address
is 0x1000
.clock
signal is 0 the instruction_address
won't change.clock
signal turn to 1 agnain instruction_address
add 4 to 0x1004
.Test InstructionDecoder_of_Single_Cycle_CPU_should_produce_correct_control_signal
This will test at this stage will or will not perform S-type
lui
add
instructure as expect.
io_instruction = 0xA02223
and we know it issw x10, 4(x0)
.io_memory_write_enable = 1
is match our expect.io_instruction = 0x22B7
and we know it islui x5, 2
.io_reg_write_enable = 1
is match our expect and the io_reg_write_address = 5
is match the x5
register.Execute.scala
In this stage, we will receive data from the Instruction Decoder. The red line of ALUOp1Src is used to determine whether the ALU should receive data from Register 1 or the instruction address. Similarly, ALUOp2Src is utilized to determine whether the ALU should receive data from Register 2 or the immediate value.
Test Execution_of_Single_Cycle_CPU_should_execute_correctly
This stage will test add
and beq
instruction when equ
and not equ
happened the funtion will perform as expect or not.
io_instruction = 0x1101B3
we can know it will be add x3, x2, x1
and we can see opcode=0x33
will match the expect with add
.io_instruction = 0x208163
and we can see the io_if_jump_flag = 1
.Test Register_File_of_Single_Cycle_CPU_should_read_the_written_content
In this test we will look for the if the stage can read the written content.
io_wirte_data = 0xDEADBEEF
and io_write_enable = 1
when the io_wirte_address = 1
whichregisters_1 = 0
.clock
the registers_1
changed registers_1 = 0xDEADBEEF
.Test Register_File_of_Single_Cycle_CPU_should_x0_always_be_zero
In this test we will look for if the register x0 be written will still be 0.
io_wirte_data = 0xDEADBEEF
and io_write_enable = 1
when io_wirte_address = 1
which point at registers_0 = 0
.clock
we can see the registers_0 = 0
still remain 0 and this will match our expect.Test Register_File_of_Single_Cycle_CPU_should_read_the_writing_content
In this test we will look for the if the stage can read the writting content.
io_wirte_data = 0xDEADBEEF
and the io_wirte_address = 2
and we can notice that registers_2 = 0
the value of this registers still not change.clock
we can notice thatregisters_2 = 0
change to 0xDEADBEEF
and at the same clock
the value of io_read_data1 = 0xDEADBEEF
, the value read and write at the same time.CPU.scala
There are three test in CPU.scala, and I will choose quicksort
to illustrate it
Single_Cycle_CPU_should_perform_a_quicksort_on_10_numbers
Ongoing
Modify the code to to fit MyCPU.
.org 0
.global _start
/* newlib system calls */
.set STDOUT,1
.set SYSEXIT, 93
.set SYSWRITE, 64
.data
# will not overflow, and will predict as false
cmp_data_1: .dword 0x0000000000000000, 0x0000000000000000
# will not overflow, and will predict as false
cmp_data_2: .dword 0x0000000000000001, 0x0000000000000010
# will not overflow, but will predict as true
cmp_data_3: .dword 0x0000000000000002, 0x4000000000000000
# will overflow, and will predict as true
cmp_data_4: .dword 0x0000000000000003, 0x7FFFFFFFFFFFFFFF
nextline: .ascii "\n"
.set str_next_len, .-nextline
blank: .ascii " "
.set str_blank_len, .-blank
.text
# assume little endian
_start:
addi sp, sp, -16
# push four pointers of test data onto the stack
la t0, cmp_data_1
sw t0, 0(sp)
la t0, cmp_data_2
sw t0, 4(sp)
la t0, cmp_data_3
sw t0, 8(sp)
la t0, cmp_data_4
sw t0, 12(sp)
addi s0, zero, 4 # s0 is the goal iteration count
addi s1, zero, 0 # s1 is the counter
addi s2, sp, 0 # s2 now points to cmp_data_1
main_loop:
lw a0, 0(s2) # a0 stores the pointer to first data in cmp_data_x
addi a1, a0, 8 # a1 stores the pointer to second data in cmp_data_x
jal ra, pimo
### print for rv32emu
addi a1, a0, 48
addi sp, sp, -4
sw a1, 0(sp)
addi a1, sp, 0
li a7, SYSWRITE
li a0, STDOUT
li a2, 4
# ecall
addi sp,sp,4
###
# printf("\n");
li a7, SYSWRITE
li a0, 1
la a1, nextline
li a2, 1
# ecall
addi s2, s2, 4 # s2 points to next cmp_data_x
addi s1, s1, 1 # counter++
bne s1, s0, main_loop
addi sp, sp, 16
j exit
# predict if multiplication overflow:
pimo:
addi sp, sp, -20
sw ra, 0(sp)
sw s0, 4(sp)
sw s1, 8(sp)
sw s2, 12(sp)
sw s3, 16(sp)
mv s0, a0 # s0 is address of x0
mv s1, a1 # s1 is address of x1
lw a0, 0(s0)
lw a1, 4(s0) # a0 a1 is now the value of x0
jal ra, clz
li s2, 63
sub s2, s2, a0 # s2 is now exp_x0
lw a0, 0(s1)
lw a1, 4(s1) # a1 a0 is now the value of x1
jal ra, clz
li s3, 63
sub s3, s3, a0 # s3 is now exp_x1
add s2, s2, s3
addi s2, s2, 2 # s2 is (exp_x0 + 1) + (exp_x1 + 1)
li t0, 64
bge s2, t0, pimo_ret_t
li a0, 0 # return false
j pimo_end
pimo_ret_t:
li a0, 1 # return true
pimo_end:
lw ra, 0(sp)
lw s0, 4(sp)
lw s1, 8(sp)
lw s2, 12(sp)
lw s3, 16(sp)
addi sp, sp, 20
ret
# count leading zeros
clz:
addi sp, sp, -4
sw ra, 0(sp)
# a0 a1 = x
bne a1, zero, clz_fill_ones_upper
clz_fill_ones_lower:
srli t0, a0, 1
or a0, a0, t0
srli t0, a0, 2
or a0, a0, t0
srli t0, a0, 4
or a0, a0, t0
srli t0, a0, 8
or a0, a0, t0
srli t0, a0, 16
or a0, a0, t0
j clz_fill_ones_end
clz_fill_ones_upper:
srli t1, a1, 1
or a1, a1, t1
srli t1, a1, 2
or a1, a1, t1
srli t1, a1, 4
or a1, a1, t1
srli t1, a1, 8
or a1, a1, t1
srli t1, a1, 16
or a1, a1, t1
li a0, 0xffffffff
clz_fill_ones_end:
# x -= ((x >> 1) & 0x5555555555555555);
srli t0, a0, 1
slli t1, a1, 31
or t0, t0, t1
srli t1, a1, 1 # t0 t1 = x >> 1
li t2, 0x55555555 # t2 is the mask
and t0, t0, t2
and t1, t1, t2 # t0 t1 = (x >> 1) & 0x5555555555555555
sltu t3, a0, t0 # t3 is the borrow bit
sub a0, a0, t0
sub a1, a1, t1
sub a1, a1, t3 # a0 a1 = x - (t0 t1)
# x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333);
srli t0, a0, 2
slli t1, a1, 30
or t0, t0, t1
srli t1, a1, 2 # t0 t1 = x >> 2
li t2, 0x33333333 # t2 is the mask
and t0, t0, t2
and t1, t1, t2 # t0 t1 = (x >> 2) & 0x3333333333333333
and t4, a0, t2
and t5, a1, t2 # t4 t5 = x & 0x3333333333333333
add a0, t0, t4
sltu t3, a0, t0 # t3 is the carry bit
add a1, t1, t5
add a1, a1, t3 # a0 a1 = (t0 t1) + (t4 t5)
# x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f;
srli t0, a0, 4
slli t1, a1, 28
or t0, t0, t1
srli t1, a1, 4 # t0 t1 = x >> 4
add t0, t0, a0
sltu t3, t0, a0 # t3 is the carry bit
add t1, t1, a1
add t1, t1, t3 # t0 t1 = (x >> 4) + x
li t2, 0x0f0f0f0f # t2 is the mask
and a0, t0, t2
and a1, t1, t2 # a0 a1 = (t0 t1) & 0x0f0f0f0f0f0f0f0f
# x += (x >> 8);
srli t0, a0, 8
slli t1, a1, 24
or t0, t0, t1
srli t1, a1, 8 # t0 t1 = x >> 8
add a0, a0, t0
sltu t3, a0, t0 # t3 is the carry bit
add a1, a1, t1
add a1, a1, t3 # a0 a1 = x + (x >> 8)
# x += (x >> 16);
srli t0, a0, 16
slli t1, a1, 16
or t0, t0, t1
srli t1, a1, 16 # t0 t1 = x >> 16
add a0, a0, t0
sltu t3, a0, t0 # t3 is the carry bit
add a1, a1, t1
add a1, a1, t3 # a0 a1 = x + (x >> 16)
# x += (x >> 32);
mv t0, a1
mv t1, zero # t0 t1 = x >> 32
add a0, a0, t0
sltu t3, a0, t0 # t3 is the carry bit
add a1, a1, t1
add a1, a1, t3 # a0 a1 = x + (x >> 32)
# return (64 - (x & 0x7f));
andi a0, a0, 0x7f # a0 = (x & 0x7f)
li t0, 64
sub a0, t0, a0 # a0 = (64 - (x & 0x7f))
lw ra, 0(sp)
addi sp, sp, 4
ret
exit:
li a7, SYSEXIT
addi a0, x0, 0
# ecall
Remove ecall
from original code.
Move hw02.S
to ./CA2023-LAB3/csrc
and make to create hw02.asmbin
.
Add the code on CPUTest.scala
class Hw02Test extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "Run the Hw02 Test." in {
test(new TestTopModule("hw02.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 500) {
c.clock.step(1000)
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
c.clock.step()
c.io.mem_debug_read_data.expect(0.U)
}
}
}
ββββThe first instruciton is `addi sp, sp, -16`
Instruction Fetch
io_instruction_read_data = 0xff010113
and the decode of this instruction will be addi sp, sp, -16
.Instruciton Decode
Execute
Memory Access
Write Back
Registers
Verilator