Try   HackMD

Assignment3: single-cycle RISC-V CPU

contributed by < eeeXun >

Installation

On Arch Linux

sudo pacman -S sbt jdk17-openjdk verilator gtkwave

The current version of the packages

pacman -Q | grep "sbt\|jdk\|verilator\|gtkwave"
gtkwave 3.3.117-1
jdk17-openjdk 17.0.9.u8-2
jre17-openjdk 17.0.9.u8-2
jre17-openjdk-headless 17.0.9.u8-2
sbt 1:1.8.3-1
verilator 5.018-1

Pass all tests

At the beginning, I have no idea where to start. So I look into the tests that I failed.

In the InstructionFetchTest, I see that when jump_flag_id is turned on, the instruction_address should be entry. And the value of entry is pass into instruction_address_id.
If the jump_flag_id is turned off, the instruction_address should be cur. And the value of cur is the prev + 4.
So at this point, I could make the InstructionFetchTest pass.

But when it goes to InstructionDecoderTest, I see the test only examine ex_aluop1_source, ex_aluop2_source, regs_reg1_read_address and regs_reg2_read_address these four values. But the value of these four signals has been already assigned in InstructionDecode. And I think these four values are all assigned correctly.
The error message shows

[info] - should produce correct control signal *** FAILED ***
[info]    : io.memory_write_enable <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException:  @[src/main/scala/riscv/core/InstructionDecode.scala 127:14] : [module InstructionDecode]  Reference io is not fully initialized.

memory_write_enable is not initialized. But I still have no idea why it would fail. In the bootcamp, I only see the failure occur when the value mismatch the value in expect function.

So I look at Lab3 again. I found that I missed this diagram. This diagram shows all the signals required for this homework. After following this diagram, all the failed tests are quickly resolved.

Assignment 2

In assembly code of assignment 2, I remove all the instructions related to ecall. And in exit, I change it to infinite jump

exit:
    j exit

There is only one test case. And the result is stored in register s3. So in my test case, I only check whether the value of register s3 is correct.

class HW2Test extends AnyFlatSpec with ChiselScalatestTester {
  behavior.of("Single Cycle CPU")
  it should "multiply two bfloat16" in {
    test(new TestTopModule("hw2.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
      for (i <- 1 to 50) {
        c.clock.step(1000)
        c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
      }
      c.io.regs_debug_read_address.poke(19.U) // s3
      c.io.regs_debug_read_data.expect(0x440a0000.U)
    }
  }
}

Waveform During Testing

Startup

I load the vcd file of assignment 2. And I compare the clock with iostruction_address and instruction at first. I observe that it take so long to start the instruction.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Then I compare it with memory, rom_loader.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

The instruction start time is almost identical to the time when rom_loader stops changing. Then I look at the objdump result from the assignment 2 and compare it with what I observed in GTKWave.

objdump
hw2.o:     file format elf32-littleriscv


Disassembly of section .text:

00000000 <_start>:
   0:   00100893                li      a7,1
   4:   00000617                auipc   a2,0x0
   8:   00060613                mv      a2,a2
   c:   00062803                lw      a6,0(a2) # 4 <_start+0x4>
  10:   028000ef                jal     38 <f32_b16_p1>
  14:   000807b3                add     a5,a6,zero
  18:   00462803                lw      a6,4(a2)
  1c:   01c000ef                jal     38 <f32_b16_p1>
  20:   00080733                add     a4,a6,zero
  24:   0a8000ef                jal     cc <encoder>
  28:   00098cb3                add     s9,s3,zero
  2c:   0b8000ef                jal     e4 <decoder>
  30:   0d0000ef                jal     100 <Multi_bfloat>
  34:   25c0006f                j       290 <exit>

00000038 <f32_b16_p1>:
  38:   01012023                sw      a6,0(sp)
  3c:   000802b3                add     t0,a6,zero
  40:   7f800fb7                lui     t6,0x7f800
  44:   01f2f333                and     t1,t0,t6
  48:   00800fb7                lui     t6,0x800
  4c:   ffff8f93                add     t6,t6,-1 # 7fffff <str+0x7ffd53>
  50:   01f2f3b3                and     t2,t0,t6
  54:   7f800fb7                lui     t6,0x7f800
  58:   07f30463                beq     t1,t6,c0 <inf_or_zero>
  5c:   00736e33                or      t3,t1,t2
  60:   060e0063                beqz    t3,c0 <inf_or_zero>
  64:   00800fb7                lui     t6,0x800
  68:   01f3e3b3                or      t2,t2,t6
  6c:   00008fb7                lui     t6,0x8
  70:   01f383b3                add     t2,t2,t6
  74:   0183df13                srl     t5,t2,0x18
  78:   020f0063                beqz    t5,98 <no_overflow>
  7c:   00800fb7                lui     t6,0x800
  80:   01f30333                add     t1,t1,t6
  84:   0113d393                srl     t2,t2,0x11
  88:   07f00f93                li      t6,127
  8c:   01f3f3b3                and     t2,t2,t6
  90:   01039393                sll     t2,t2,0x10
  94:   0140006f                j       a8 <f32_b16_p2>

00000098 <no_overflow>:
  98:   0103d393                srl     t2,t2,0x10
  9c:   07f00f93                li      t6,127
  a0:   01f3f3b3                and     t2,t2,t6
  a4:   01039393                sll     t2,t2,0x10

000000a8 <f32_b16_p2>:
  a8:   01f2d293                srl     t0,t0,0x1f
  ac:   01f29293                sll     t0,t0,0x1f
  b0:   0062e2b3                or      t0,t0,t1
  b4:   0072e2b3                or      t0,t0,t2
  b8:   00028833                add     a6,t0,zero
  bc:   00008067                ret

000000c0 <inf_or_zero>:
  c0:   01085813                srl     a6,a6,0x10
  c4:   01081813                sll     a6,a6,0x10
  c8:   00008067                ret

000000cc <encoder>:
  cc:   000782b3                add     t0,a5,zero
  d0:   00070333                add     t1,a4,zero
  d4:   01035313                srl     t1,t1,0x10
  d8:   0062e2b3                or      t0,t0,t1
  dc:   000289b3                add     s3,t0,zero
  e0:   00008067                ret

000000e4 <decoder>:
  e4:   000c82b3                add     t0,s9,zero
  e8:   ffff0937                lui     s2,0xffff0
  ec:   0122f333                and     t1,t0,s2
  f0:   01029393                sll     t2,t0,0x10
  f4:   00030b33                add     s6,t1,zero
  f8:   00038ab3                add     s5,t2,zero
  fc:   00008067                ret

00000100 <Multi_bfloat>:
 100:   000a82b3                add     t0,s5,zero
 104:   000b0333                add     t1,s6,zero
 108:   7f800fb7                lui     t6,0x7f800
 10c:   01f2fe33                and     t3,t0,t6
 110:   01f373b3                and     t2,t1,t6
 114:   007e0e33                add     t3,t3,t2
 118:   3f800fb7                lui     t6,0x3f800
 11c:   41fe0e33                sub     t3,t3,t6
 120:   0062c3b3                xor     t2,t0,t1
 124:   01f3d393                srl     t2,t2,0x1f
 128:   01f39393                sll     t2,t2,0x1f
 12c:   007e6e33                or      t3,t3,t2
 130:   00929293                sll     t0,t0,0x9
 134:   0092d293                srl     t0,t0,0x9
 138:   005e62b3                or      t0,t3,t0
 13c:   007f0fb7                lui     t6,0x7f0
 140:   01f2f3b3                and     t2,t0,t6
 144:   01f37e33                and     t3,t1,t6
 148:   00839393                sll     t2,t2,0x8
 14c:   80000fb7                lui     t6,0x80000
 150:   01f3e3b3                or      t2,t2,t6
 154:   0013d393                srl     t2,t2,0x1
 158:   008e1e13                sll     t3,t3,0x8
 15c:   01fe6e33                or      t3,t3,t6
 160:   001e5e13                srl     t3,t3,0x1
 164:   00000333                add     t1,zero,zero
 168:   80000fb7                lui     t6,0x80000
 16c:   001fdf93                srl     t6,t6,0x1
 170:   01f3feb3                and     t4,t2,t6
 174:   01d03433                snez    s0,t4
 178:   40800433                neg     s0,s0
 17c:   01c474b3                and     s1,s0,t3
 180:   00930333                add     t1,t1,s1
 184:   001e5e13                srl     t3,t3,0x1
 188:   001fdf93                srl     t6,t6,0x1
 18c:   01f3feb3                and     t4,t2,t6
 190:   01d03433                snez    s0,t4
 194:   40800433                neg     s0,s0
 198:   01c474b3                and     s1,s0,t3
 19c:   00930333                add     t1,t1,s1
 1a0:   001e5e13                srl     t3,t3,0x1
 1a4:   001fdf93                srl     t6,t6,0x1
 1a8:   01f3feb3                and     t4,t2,t6
 1ac:   01d03433                snez    s0,t4
 1b0:   40800433                neg     s0,s0
 1b4:   01c474b3                and     s1,s0,t3
 1b8:   00930333                add     t1,t1,s1
 1bc:   001e5e13                srl     t3,t3,0x1
 1c0:   001fdf93                srl     t6,t6,0x1
 1c4:   01f3feb3                and     t4,t2,t6
 1c8:   01d03433                snez    s0,t4
 1cc:   40800433                neg     s0,s0
 1d0:   01c474b3                and     s1,s0,t3
 1d4:   00930333                add     t1,t1,s1
 1d8:   001e5e13                srl     t3,t3,0x1
 1dc:   001fdf93                srl     t6,t6,0x1
 1e0:   01f3feb3                and     t4,t2,t6
 1e4:   01d03433                snez    s0,t4
 1e8:   40800433                neg     s0,s0
 1ec:   01c474b3                and     s1,s0,t3
 1f0:   00930333                add     t1,t1,s1
 1f4:   001e5e13                srl     t3,t3,0x1
 1f8:   001fdf93                srl     t6,t6,0x1
 1fc:   01f3feb3                and     t4,t2,t6
 200:   01d03433                snez    s0,t4
 204:   40800433                neg     s0,s0
 208:   01c474b3                and     s1,s0,t3
 20c:   00930333                add     t1,t1,s1
 210:   001e5e13                srl     t3,t3,0x1
 214:   001fdf93                srl     t6,t6,0x1
 218:   01f3feb3                and     t4,t2,t6
 21c:   01d03433                snez    s0,t4
 220:   40800433                neg     s0,s0
 224:   01c474b3                and     s1,s0,t3
 228:   00930333                add     t1,t1,s1
 22c:   001e5e13                srl     t3,t3,0x1
 230:   001fdf93                srl     t6,t6,0x1
 234:   01f3feb3                and     t4,t2,t6
 238:   01d03433                snez    s0,t4
 23c:   40800433                neg     s0,s0
 240:   01c474b3                and     s1,s0,t3
 244:   00930333                add     t1,t1,s1
 248:   001e5e13                srl     t3,t3,0x1
 24c:   80000fb7                lui     t6,0x80000
 250:   01f37eb3                and     t4,t1,t6
 254:   000e8a63                beqz    t4,268 <not_overflow>
 258:   00131313                sll     t1,t1,0x1
 25c:   00800fb7                lui     t6,0x800
 260:   01f282b3                add     t0,t0,t6
 264:   0080006f                j       26c <Mult_end>

00000268 <not_overflow>:
 268:   00231313                sll     t1,t1,0x2

0000026c <Mult_end>:
 26c:   01835313                srl     t1,t1,0x18
 270:   00130313                add     t1,t1,1
 274:   00135313                srl     t1,t1,0x1
 278:   01031313                sll     t1,t1,0x10
 27c:   0172d293                srl     t0,t0,0x17
 280:   01729293                sll     t0,t0,0x17
 284:   0062e2b3                or      t0,t0,t1
 288:   000289b3                add     s3,t0,zero
 28c:   00008067                ret

00000290 <exit>:
 290:   0000006f                j       290 <exit>

00000294 <test0>:
 294:   4141f9a7                .word   0x4141f9a7
 298:   423645a2                .word   0x423645a2

0000029c <test1>:
 29c:   3fa66666                .word   0x3fa66666
 2a0:   42c63333                .word   0x42c63333

000002a4 <test2>:
 2a4:   43e43a5e                .word   0x43e43a5e
 2a8:   42b1999a                .word   0x42b1999a

000002ac <str>:
 2ac:   0000000a                .word   0x0000000a

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

So these period should be the time of loading ELF file into memory!

InstructionFetch

When inst_fetch.io.instruction_read_data is loaded, it takes 3 cpu clock cycles to dump the inst_fetch.io.instruction_address, which is the PC. And when inst_fetch.io.instruction_address changes, it takes 1 cpu clock cycle to load the inst_fetch.io.instruction_read_data.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

3 cpu clock cycles and 1 cpu cycle correspond to the instruction fetch clock cycle.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

InstructionDecode

InstructionDecode gets the output immediately when input instruction is signaled. And it holds the state for 4 cpu cycles.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

It is different from InstructionFetch, there is no clock inside InstructionDecode. I guess this is due to there is no register inside InstructionDecode.

The register inside InstructionFetch

 val pc = RegInit(ProgramCounter.EntryAddress)

Execute

When an instruction in Execute recives all input signals from instruction, instruction_address, reg1_data, reg2_data, immediate, aluop1_source and aluop2_source, it generates output mem_alu_result, if_jump_flag and if_jump_address immediately. And it holds the state for 4 cpu cycles.

But here is something weird in some cases, it did not hold the state for 4 cpu cycles. Take the following auipc instruction for example

00000000 <_start>:
   0:   00100893                li      a7,1
   4:   00000617                auipc   a2,0x0

The mem_alu_result changes at 3rd cpu cycle. And I found out it's due to the changing time of ex.io.instruction(this is from inst_fetch.io.instruction) does not synchronize with ex.io.instruction_address(this is from inst_fetch.io.instruction_address).

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

MemoryAccess

In the following lw example, it takes 1 cpu cycle to load data from memory since instrcution fetched. And it hold the data for 3 cpu cycles.

00000000 <_start>:
   0:   00100893                li      a7,1
   4:   00000617                auipc   a2,0x0
   8:   00060613                mv      a2,a2
   c:   00062803                lw      a6,0(a2) # 4 <_start+0x4>

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

WriteBack

In the following auipc example, the wb_io_regs_write_data is come from ex_io_mem_alu_result

00000000 <_start>:
   0:   00100893                li      a7,1
   4:   00000617                auipc   a2,0x0

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

In the following lw example, it loads a word 0x4141f9a7. The wb_io_regs_write_data is come after 1 cpu cycle since instructions loaded. Because the data is from mem_io_wb_memory_read_data

00000000 <_start>:
   0:   00100893                li      a7,1
   4:   00000617                auipc   a2,0x0
   8:   00060613                mv      a2,a2
   c:   00062803                lw      a6,0(a2) # 4 <_start+0x4>

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Waveform on Verilator

The vcd generated by verilator is different from the vcd file generated during testing. There is no boot up time. In the fisrt cpu cycle, the CPU continued to fetch, decode, execute instructinons. And the cpu cycle is 4 ps, which is different from 2 ps generated during testing.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

The output inst_fetch_io_instruction_read_data is generated after quarter cpu cycle, which is 1 ps, since inst_fetch_io_instrucion_address is signaled. However, the result generated during testing is 1 cpu cycle, 2ps.

The time interval between instruction fetching and next instrucion fetching is 1 cpu cycle, which is 4 ps. The result is different from the time generated during testing, which is 4 cpu cycles, 8 ps.

ecall

I implement ecall in this branch. And test it with the assembly program that print RISC-V\n.

Initially, I add the signal ecall_flag, which will be turned on when instruction ecall is decoded, ecall_a0, ecall_a1, ecall_a2 and ecall_a7 to the CPUBundle.scala. The ecall_a1, ecall_a2 and ecall_a7 are data of register.

In verilog/verilator/sim_main.cpp, Simulator.run function, I check if top->io_ecall_flag is true. If it is true, then I check top->io_ecall_a7 code. Then I compare it with system call number. If the code is write, then I get the data from memory->read function with starting address top->io_ecall_a1 and length top->io_ecall_a2.

When I run make verilator, it just get some errors

[error] firrtl.passes.PassExceptions:
[error] firrtl.passes.CheckInitialization$RefNotInitializedException:  @[src/main/scala/board/verilator/Top.scala 15:14] : [module Top]  Reference io is not fully initialized.
[error]    : io.ecall_a7 <= VOID
[error] firrtl.passes.CheckInitialization$RefNotInitializedException:  @[src/main/scala/board/verilator/Top.scala 15:14] : [module Top]  Reference io is not fully initialized.
[error]    : io.ecall_flag <= VOID
[error] firrtl.passes.CheckInitialization$RefNotInitializedException:  @[src/main/scala/board/verilator/Top.scala 15:14] : [module Top]  Reference io is not fully initialized.
[error]    : io.ecall_a0 <= VOID
[error] firrtl.passes.CheckInitialization$RefNotInitializedException:  @[src/main/scala/board/verilator/Top.scala 15:14] : [module Top]  Reference io is not fully initialized.
[error]    : io.ecall_a2 <= VOID
[error] firrtl.passes.CheckInitialization$RefNotInitializedException:  @[src/main/scala/board/verilator/Top.scala 15:14] : [module Top]  Reference io is not fully initialized.
[error]    : io.ecall_a1 <= VOID
[error] firrtl.passes.PassException: 5 errors detected!

So I add the code below to src/main/scala/board/verilator/Top.scala

--- a/src/main/scala/board/verilator/Top.scala
+++ b/src/main/scala/board/verilator/Top.scala
@@ -25,6 +25,12 @@ class Top extends Module {
   cpu.io.instruction     := io.instruction

   cpu.io.instruction_valid := io.instruction_valid
+
+  io.ecall_flag := cpu.io.ecall_flag
+  io.ecall_a0   := cpu.io.ecall_a0
+  io.ecall_a1   := cpu.io.ecall_a1
+  io.ecall_a2   := cpu.io.ecall_a2
+  io.ecall_a7   := cpu.io.ecall_a7
 }

 object VerilogGenerator extends App {

Then it works! But I'm not sure what is the relationship between Top.scala and CPU.scala.

When I run the verilator, I found there are some bugs in my code. The string RISC-V\n is printed out 3 times. Then I inspect the wavform dumped from verilator. The ecall is just like other instrion, it hold for 1 cpu cycle.

image

So I suspect it is caused by the while loop in Simulator.run function, which is not looping every cpu cycle.