Try   HackMD

Assignment3: Single-cycle RISC-V CPU

contributed by < jimmylu890303 >

Hello World in Chisel

class Hello extends Module {
  val io = IO(new Bundle {
    val led = Output(UInt(1.W))
  })
  val CNT_MAX = (50000000 / 2 - 1).U;
  val cntReg  = RegInit(0.U(32.W))
  val blkReg  = RegInit(0.U(1.W))
  cntReg := cntReg + 1.U
  when(cntReg === CNT_MAX) {
    cntReg := 0.U
    blkReg := ~blkReg
  }
  io.led := blkReg
}
  • There is no input signal detected here. However, there is an output signal called led. The led is of type unsigned int and has a bit width of 1.
  • CNT_MAX is a constant set to 29999999.
  • cntReg is a 32-bit unsigned integer register initialized with a value of 0.
  • blkReg is a 1-bit unsigned integer register initialized with a value of 0.
  • On each clock cycle, cntReg is incremented by one.
  • When cntReg reaches CNT_MAX, cntReg is reset to zero, but blkReg remains unchanged.
  • The output led is controlled by the value stored in blkReg

Below is the code where I'm using logic circuits to enhance the original code, employing a Mux to control the signal when cntReg equals CNT_MAX.

class Hello extends Module {
  val io = IO(new Bundle {
    val led = Output(UInt(1.W))
  })
  val CNT_MAX = (50000000 / 2 - 1).U;
  val cntReg  = RegInit(0.U(32.W))
  val blkReg  = RegInit(0.U(1.W))
  cntReg := Mux(cntReg === CNT_MAX,0.U,cntReg + 1.U)
  blkReg := Mux(cntReg === CNT_MAX,~blkReg,blkReg)
  io.led := blkReg
}

Lab 3 : Single Cycle RISC-V CPU

We need to add code to four Scala files to complete the modules.

  • InstructionFetch.scala
  • InstructionDecode.scala
  • Execute.scala
  • CPU.scala

Above are the four Scala files.

InstructionFetch.scala:

In the InstructionFetch.scala file, the IF module needs to determine the next instruction address to be stored in the program counter based on the jump_flag_id signal.

image

Test InstructionFetch

We will test the InstructionFetch process 100 times. Each time, a random number (0 or 1) will be generated.

  • If the number is 0, indicating no jump, the output signal instruction_address is expected to be pre + 4.
  • If the number is 1, indicating a jump, the target address for the jump is entry. Thus, the output signal instruction_address is expected to be entry.
for (x <- 0 to 100) {
        Random.nextInt(2) match {
          case 0 => // no jump
            cur = pre + 4
            c.io.jump_flag_id.poke(false.B)
            c.clock.step()
            c.io.instruction_address.expect(cur)
            pre = pre + 4
          case 1 => // jump
            c.io.jump_flag_id.poke(true.B)
            c.io.jump_address_id.poke(entry)
            c.clock.step()
            c.io.instruction_address.expect(entry)
            pre = entry
        }
      }

src/test/scala/riscv/singlecycle/InstructionFetchTest.scala

Analysis with GTKWave

  • jump_flag_id is set to 1
    Screenshot from 2023-11-23 20-03-06
    Screenshot from 2023-11-23 20-12-19
    when the jump_flag_id is set to 1, the program counter (pc) will be set to 0x1000 (entry) in the next cycle.

  • jump_flag_id is set to 0
    Screenshot from 2023-11-23 20-27-12
    Screenshot from 2023-11-23 20-27-25
    when the jump_flag_id is set to 0, the program counter (pc) will be set to PC+4(0x1004+4) in the next cycle.

InstructionDecode.scala:

In the InstructionDecode.scala file, the ID module is responsible for decoding the input signal instruction and generating multiple control signals for the circuit.
image
Within the complete InstructionDecode.scala module, this section will ascertain the following 10 output signals by parsing the 32-bit instruction.

  • regs_reg1_read_address
  • regs_reg2_read_address
  • ex_immediate
  • ex_aluop1_source
  • ex_aluop2_source
  • memory_read_enable
  • memory_write_enable
  • wb_reg_write_source
  • reg_write_enable
  • reg_write_address

Test InstructionDecode

c.io.instruction.poke(0x00a02223L.U) // S-type
c.io.ex_aluop1_source.expect(ALUOp1Source.Register)
c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate)
c.io.regs_reg1_read_address.expect(0.U)
c.io.regs_reg2_read_address.expect(10.U)
c.clock.step()

c.io.instruction.poke(0x000022b7L.U) // lui
c.io.regs_reg1_read_address.expect(0.U)
c.io.ex_aluop1_source.expect(ALUOp1Source.Register)
c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate)
c.clock.step()

c.io.instruction.poke(0x002081b3L.U) // add
c.io.ex_aluop1_source.expect(ALUOp1Source.Register)
c.io.ex_aluop2_source.expect(ALUOp2Source.Register)
c.clock.step()

src/test/scala/riscv/singlecycle/InstructionDecoderTest.scala

In this test, it will check three types of instructions: S-type, lui, and add.

Analysis with GTKWave

  • Instruction 0x00a02223(S-type)
    Screenshot from 2023-11-23 21-36-03
  • Instruction 0x000022b7(lui)
    Screenshot from 2023-11-23 21-39-12
  • Instruction 0x002081b3(add)
    Screenshot from 2023-11-23 21-40-01

Execute.scala

In the Execute.scala file, there are two main modules.

One is the ALU control, responsible for generating the corresponding ALU function code based on the opcode, funct3, and funct7 of the input instruction.

The other is the ALU, which performs the designated function determined by the ALU function code generated by the ALU control.

In the complete Execute module, it will produce the result from the ALU, as well as output the signals if_jump_flag and if_jump_address.
image

Test Execute

c.io.instruction.poke(0x001101b3L.U) // x3 =  x2 + x1

var x = 0
for (x <- 0 to 100) {
val op1    = scala.util.Random.nextInt(429496729)
val op2    = scala.util.Random.nextInt(429496729)
val result = op1 + op2
val addr   = scala.util.Random.nextInt(32)

c.io.reg1_data.poke(op1.U)
c.io.reg2_data.poke(op2.U)

c.clock.step()
c.io.mem_alu_result.expect(result.U)
c.io.if_jump_flag.expect(0.U)
}

// beq test
c.io.instruction.poke(0x00208163L.U) // pc + 2 if x1 === x2
c.io.instruction_address.poke(2.U)
c.io.immediate.poke(2.U)
c.io.aluop1_source.poke(1.U)
c.io.aluop2_source.poke(1.U)
c.clock.step()

// equ
c.io.reg1_data.poke(9.U)
c.io.reg2_data.poke(9.U)
c.clock.step()
c.io.if_jump_flag.expect(1.U)
c.io.if_jump_address.expect(4.U)

// not equ
c.io.reg1_data.poke(9.U)
c.io.reg2_data.poke(19.U)
c.clock.step()
c.io.if_jump_flag.expect(0.U)
c.io.if_jump_address.expect(4.U)

src/test/scala/riscv/singlecycle/ExecuteTest.scala

In this test,it will test 2 types of instrucitons(x3 = x2 + x1 and beq).

Analysis with GTKWave

  • x3 = x2 + x1
    Screenshot from 2023-11-23 22-12-05
  • beq (equal occur)
    Screenshot from 2023-11-23 22-15-31
  • beq (not equal occur)
    Screenshot from 2023-11-23 22-16-38

Modify the handwritten RISC-V assembly code in Homework2

Modify the origin homework2 code

Because the Single Cycle CPU lacks a system call for printing, I'm unable to directly print the output result while executing the assembly code. Instead of utilizing the print system call, I've adapted the code to store the output result in memory.

In homework 2, we are required to modify the relevant system call in rv32emu to display the result and convert the output from numerical to ASCII format.

jal ra, pimo
addi a0, a0, 48
la t0, buffer
sb zero, 1(t0)
sb a0, 0(t0)
li a0, 1
la a1, buffer
li a2, 2
li a7, SYSWRITE            
ecall               # print result of pimo (which is in a0)

In homework3,

# sw result in mem
sw a0, 0(s3)

The subsequent steps outline the modifications I made to enable the code to run on the Single Cycle CPU.

  • Put my code main.S into /csrc directory.
  • Saves the output result at memory addresses 0x4, 0x8, 0xC, and 0x10.
  • Modify the Makefile to generate main.asmbin.
  • After generating main.asmbin, move this file to the directory src/main/resources.
  • Add a corresponding test named Hw2Test in the CPUTest.scala file.

Test my RISC-V assembly

To test my RISC-V assembly code, I've added a test named Hw2Test to CPUTest.scala. Here, I verify the results at memory addresses 0x4, 0x8, 0xC, and 0x10.

class Hw2Test extends AnyFlatSpec with ChiselScalatestTester {
  behavior.of("Single Cycle CPU")
  it should "Implementation of multiplication overflow prediction for unsigned integers using CLZ" in {
    test(new TestTopModule("main.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
      for (i <- 1 to 10) {
        c.clock.step(1000)
        c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
      }
      // result should be 0 0 1 1
      c.io.mem_debug_read_address.poke(4.U)
      c.clock.step()
      c.io.mem_debug_read_data.expect(0.U)

      c.io.mem_debug_read_address.poke(8.U)
      c.clock.step()
      c.io.mem_debug_read_data.expect(0.U)

      c.io.mem_debug_read_address.poke(12.U)
      c.clock.step()
      c.io.mem_debug_read_data.expect(1.U)

      c.io.mem_debug_read_address.poke(16.U)
      c.clock.step()
      c.io.mem_debug_read_data.expect(1.U)
    }
  }
}

Run test:

sbt "testOnly riscv.singlecycle.Hw2Test"

Output:

[info] welcome to sbt 1.9.7 (OpenLogic Java 11.0.21)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ..
[info] loading project definition from /home/jimmy/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/jimmy/ca2023-lab3/)
[info] Hw2Test:
[info] Single Cycle CPU
[info] - should Implementation of multiplication overflow prediction for unsigned integers using CLZ
[info] Run completed in 29 seconds, 876 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 35 s, completed Nov 25, 2023, 2:34:59 PM

Using Verilator to Run the Assembly

./run-verilator.sh -instruction src/main/resources/main.asmbin -time 2000 -vcd dump01.vcd

Output:

-time 2000
-memory 1048576
-instruction src/main/resources/main.asmbin
[-------------------->] 100%

Use GTKWave to see wave

Case 1:

Screenshot from 2023-11-25 16-00-24

prev cycle

Screenshot from 2023-11-25 16-00-35

next cycle

  • Instruction 0x024000EF is equal to jal ra, pimo(jal x1, 36).
  • PC is now 0x00001050,and regs_write_source=0b11.So write back value ra = PC+4.
    • image
  • Target jump address is 0x1074(Computed from ALU),and if_jump_flag is 1(Computed from Jump judge).
  • So next cycle PC is set to 0x1074.

Case 2:

Screenshot from 2023-11-25 16-29-48

  • Instruction 0x00512023 is equal to sw t0, 0(sp)(sw x5, 0(x2)).
  • io_memory_write_enable is 1,because it is a store word instruction.
  • ALU.op1 is the address of sp, and ALU.op2 is the offset(immeditate).
  • im_mem_alu_result is the target writing memory address.
  • regs_io_read_datas is the value stored in $a0.

Case 3:
Screenshot from 2023-11-25 16-42-56

  • Instruction 0x00512023 is equal to li s3, 4(addi x19, x0, 4).
  • ALU.op1 is the value of x0, and ALU.op2 is the value of immediate.
  • ALU.mem_alu_result is 0x4 and wb_reg_write_source is 0b00. So regs_write_data is set by ALU.mem_alu_result.
  • Target register is 0x13($s3),and regs_io_write_enable is 1. So $s3 will be set to 4.