Try   HackMD

Assignment3: Single-cycle RISC-V CPU

contributed by chihenliu

1.Environment Setup

My OS is Ubuntu 22.04.3 LTS

1.1. Install the dependent package

$sudo apt install build-essential verilator gtkwave

1.2.Install sbt/JDK/SDKMAN

1.2.1 Install SDKMAN

follow the instructions install SDKman

$curl -s "https://get.sdkman.io" | bash
$source "$HOME/.sdkman/bin/sdkman-init.sh"
$sdk version

1.2.2 Install sbt

follow the instructions install sbt

$sdk install java $(sdk list java | grep -o "\b8\.[0-9]*\.[0-9]*\-tem" | head -1)
$sdk install sbt

The installation of sbt is complete.

1.2.3 Install JDK

follow Lab3: Construct a single-cycle RISC-V CPU with Chisel instructions

$sdk install java 11.0.21-tem 

The installation of JDK is complete.

1.2.4 Install GTKWave

For general install
1.Type./configure 2.make 3.sudo make install
However, my Ubuntu is encountering errors, so I'm following the installation instructions based on the README.md as follows:

$sudo apt-get install libjudy-dev
$sudo apt-get install libbz2-dev
$sudo apt-get install liblzma-dev
$sudo apt-get install libgconf2-dev
$sudo apt-get install libgtk2.0-dev
$sudo apt-get install tcl-dev
$sudo apt-get install tk-dev
$sudo apt-get install gperf
$sudo apt-get install gtk2-engines-pixbuf

After above instrcution install Package ,Iinstall GTKWave using Type./configure,make,sudo make install

2. Explaination of Hello World in Chisel

2.1 Chisel tutorials

follow the instructions:

$ git clone https://github.com/ucb-bar/chisel-tutorial
$ cd  chisel-tutorial
$ git checkout release
$ sbt run

Output:

test Hello Success: 1 tests passed in 6 cycles taking 0.004980 seconds

[info] [0.002] RAN 1 CYCLES PASSED
[success] Total time: 2 s, completed

You also can run all examples:

$./run-examples.sh all

2.2 Chisel Bootcamp

  • 1.Introduction to Scala
  • 2.1.Your First Chisel Module
  • 2.2.Combinational Logic
  • 2.3.Control Flow
  • 2.4.Sequential Logic
  • 2.5.Putting it all Together: An FIR Filter
  • 2.6.More on ChiselTest
  • 3.1.Generators: Parameters
  • 3.2.Generators: Collections
  • 3.3.Chisel Standard Library
  • 3.4.Higher-Order Functions
  • 3.5.Functional Programming
  • 3.6.Object Oriented Programming
  • 3.7.Generators: Types

before Dec.1 I will go through all the steps.

2.3 Explaination of Hello World

class Hello extends Module {
  val io = IO(new Bundle {
    val led = Output(UInt(1.W))
  })
  val CNT_MAX = (50000000 / 2 - 1).U;
  val cntReg  = RegInit(0.U(32.W))
  val blkReg  = RegInit(0.U(1.W))
  cntReg := cntReg + 1.U
  when(cntReg === CNT_MAX) {
    cntReg := 0.U
    blkReg := ~blkReg                                                                                                                                         
  }
  io.led := blkReg
}
  • It is observed I/O Bundle there is only one output signal 'led' with no input signals, and 'led' is an unsigned integer with a bits width of 1.
  • cntReg It is a 32-bit unsigned integer register, initialized with 0
  • blkReg It is a 1-bit unsigned integer register, initialized with 0 and Used to control the state of the LED
  • CNT_MAXIt is a constant with a value of 24999999. This value is typically set based on the system's clock frequency and is used to control the flashing frequency of the LED
  • In each clock cycle, the value of cntReg increases by 1
  • When cntReg reaches the value of CNT_MAX, cntReg is reset to 0, and the value of blkReg is inverted

We can achieve another LED functionality by eliminating blkReg

// Hello in chisel ,after eliminating blkReg
class Hello extends Module {
  val io = IO(new Bundle {
    val led = Output(UInt(1.W))
  })

  val cntMax = (50000000 / 2 - 1).U
  val cntReg = RegInit(0.U(32.W))
  cntReg := Mux(cntReg === cntMax, 0.U, cntReg + 1.U)
  io.led := cntReg === cntMax
}
  • when cntReg reaches cntMax, we directly set the LED to ON (1), while at other times, the LED is turned off (0)
  • The LED will briefly flash each time cntReg reaches cntMax, rather than remaining illuminated until the next counting cycle is completed

3. Complete Lab 3 MyCPU code

3.1. Single-cycle CPU

Single-cycle CPU diagarm

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

InstrcutionFetch stage

InstrcutionDecode stage

Execute stage

Memory Access stage

Write-Back stage

3.2. Finsh My CPU code

We need to add code to four Scala files to complete the modules in src/main/scala/riscv/core

  • InstructionFetch.scala
  • InstructionDecode.scala
  • Execute.scala
  • CPU.scala

By completing the Instruction Fetch, Instruction Decode, and Execute stages, and then using the aforementioned components, I have completed the CPU section.
Here is my repository for Lab 3, which was forked from ca2023-lab3.

3.3. MyCPU test and Waveform

Test command:

$sbt test

However, since the CPU code was not initially completed, you will receive the following Output:

[info] *** 6 TESTS FAILED ***
[error] Failed tests:
[error] 	riscv.singlecycle.InstructionDecoderTest
[error] 	riscv.singlecycle.ByteAccessTest
[error] 	riscv.singlecycle.InstructionFetchTest
[error] 	riscv.singlecycle.ExecuteTest
[error] 	riscv.singlecycle.FibonacciTest
[error] 	riscv.singlecycle.QuicksortTest
[error] (Test / test) sbt.TestsFailedException: Tests unsuccessful

After completing the missing code for the Instruction Fetch, Instruction Decode, and Execute stages as well as the CPU, I proceeded to test according to the command provided in Lab 3.

$sbt test

we can get following Output:

[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] QuicksortTest:
[info] Single Cycle CPU
[info] - should perform a quicksort on 10 numbers
[info] ByteAccessTest:
[info] Single Cycle CPU
[info] - should store and load a single byte
[info] FibonacciTest:
[info] Single Cycle CPU
[info] - should recursively calculate Fibonacci(10)
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] RegisterFileTest:
[info] Register File of Single Cycle CPU
[info] - should read the written content
[info] - should x0 always be zero
[info] - should read the writing content
[info] Run completed in 9 seconds, 325 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 10 s, completed Nov 28, 2023, 5:41:06 PM

To test a single test case, you can use the following command

$sbt "testOnly riscv.singlecycle.XXXTest"

3.3.1. InstructionFetch test

The PC is initialized to ProgramCounter.EntryAddress. The jump_flag_id is used to determine whether a jump should be executed; it is a control signal. If it is true, a jump is executed, and the PC is updated to the memory location provided by jump_address_id. If it is false, PC is incremented by 4 to execute the next instruction

InstructionFetchTest.scala
class InstructionFetchTest extends AnyFlatSpec with ChiselScalatestTester {
  behavior.of("InstructionFetch of Single Cycle CPU")
  it should "fetch instruction" in {
    test(new InstructionFetch).withAnnotations(TestAnnotations.annos) { c =>
      val entry = 0x1000
      var pre   = entry
      var cur   = pre
      c.io.instruction_valid.poke(true.B)
      var x = 0
      for (x <- 0 to 100) {
        Random.nextInt(2) match {
          case 0 => // no jump
            cur = pre + 4
            c.io.jump_flag_id.poke(false.B)
            c.clock.step()
            c.io.instruction_address.expect(cur)
            pre = pre + 4
          case 1 => // jump
            c.io.jump_flag_id.poke(true.B)
            c.io.jump_address_id.poke(entry)
            c.clock.step()
            c.io.instruction_address.expect(entry)
            pre = entry
        }
      }

In the given example, a random number is generated. If this random number is 0, the program continues without any jump, and the Program Counter (PC) simply increments by 4 (to pre + 4). Conversely, if the random number is 1, the program executes a jump to the entry address

Waveform
  • jump_flag_id set to 1

screenshot 2023-11-28 203504
screenshot 2023-11-28 203516
When jump_flag_id is set to 1, you can observe that instead of incrementing PC by 4 to become 0x1012, it directly jumps to 0x1000 from its original memory Address at 0x1008

  • jump_flag_id set to 0
    screenshot 2023-11-28 204421
    screenshot 2023-11-28 204411

You can observe that when jump_flag_id is set to 0, the PC memory address transitions from 0x1000 to 0x1004 after the next clock cycle, following the PC+4

3.3.2. Instruction Decode test

In the ID stage, an input signal instruction is decoded by the ID unit, generating various control signals for the circuit,After completing the ID module, you will obtain a total of 10 complete outputs。

InstructionDecodeTest.scala
class InstructionDecoderTest extends AnyFlatSpec with ChiselScalatestTester {
  behavior.of("InstructionDecoder of Single Cycle CPU")
  it should "produce correct control signal" in {
    test(new InstructionDecode).withAnnotations(TestAnnotations.annos) { c =>
      c.io.instruction.poke(0x00a02223L.U) // S-type
      c.io.ex_aluop1_source.expect(ALUOp1Source.Register)
      c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate)
      c.io.regs_reg1_read_address.expect(0.U)
      c.io.regs_reg2_read_address.expect(10.U)
      c.clock.step()

      c.io.instruction.poke(0x000022b7L.U) // lui
      c.io.regs_reg1_read_address.expect(0.U)
      c.io.ex_aluop1_source.expect(ALUOp1Source.Register)
      c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate)
      c.clock.step()

      c.io.instruction.poke(0x002081b3L.U) // add
      c.io.ex_aluop1_source.expect(ALUOp1Source.Register)
      c.io.ex_aluop2_source.expect(ALUOp2Source.Register)
      c.clock.step()
    }
  }
}

The above code verifies three instructions: S-type, lui, and add. I added two signals, memory_read_enable and memory_write_enable, in the InstructionDecoder.scala file,
and the above test case lacks testing formemory_write_enable. Perhaps, additional test cases can be added for memory_write_enable as part of completing Assignment 3

Waveform

screenshot 2023-11-28 213124
S-type Waveform
screenshot 2023-11-28 213214
lui Waveform
screenshot 2023-11-28 213256
add Waveform
screenshot 2023-11-28 213424
screenshot 2023-11-28 221000
screenshot 2023-11-28 221031

3.3.3. Execute test

Based on Execute.scala, this stage is primarily composed of two modules: ALU and ALU Control. ALU Control is responsible for generating opcode, funct3, and funct7. Subsequently, ALU performs operations using the code it generates, resulting in output signals if_jump_flag and if_jump_address

ExecuteTest.scala
class ExecuteTest extends AnyFlatSpec with ChiselScalatestTester {
  behavior.of("Execution of Single Cycle CPU")
  it should "execute correctly" in {
    test(new Execute).withAnnotations(TestAnnotations.annos) { c =>
      c.io.instruction.poke(0x001101b3L.U) // x3 =  x2 + x1

      var x = 0
      for (x <- 0 to 100) {
        val op1    = scala.util.Random.nextInt(429496729)
        val op2    = scala.util.Random.nextInt(429496729)
        val result = op1 + op2
        val addr   = scala.util.Random.nextInt(32)

        c.io.reg1_data.poke(op1.U)
        c.io.reg2_data.poke(op2.U)

        c.clock.step()
        c.io.mem_alu_result.expect(result.U)
        c.io.if_jump_flag.expect(0.U)
      }

      // beq test
      c.io.instruction.poke(0x00208163L.U) // pc + 2 if x1 === x2
      c.io.instruction_address.poke(2.U)
      c.io.immediate.poke(2.U)
      c.io.aluop1_source.poke(1.U)
      c.io.aluop2_source.poke(1.U)
      c.clock.step()

      // equ
      c.io.reg1_data.poke(9.U)
      c.io.reg2_data.poke(9.U)
      c.clock.step()
      c.io.if_jump_flag.expect(1.U)
      c.io.if_jump_address.expect(4.U)

      // not equ
      c.io.reg1_data.poke(9.U)
      c.io.reg2_data.poke(19.U)
      c.clock.step()
      c.io.if_jump_flag.expect(0.U)
      c.io.if_jump_address.expect(4.U)
    }
  }
}

I have added the signal assignments for alu.io.func, alu.io.op1, and alu.io.op2 in Execute that were previously incomplete. This test is conducted to verify three types of operations: x1+x2=x3, equ (equal), and not equ (not equal)

Waveform

X3=X1+X2
screenshot 2023-11-28 220115
beq
screenshot 2023-11-28 220203
not beq
screenshot 2023-11-28 220219

4. HomeWork2 Assembly Code Adapt on MyCPU

4.1. Modify the origin homework2 code

Because the single-cycle CPU lacks system calls, I will remove the ecall, rdcycle, and rdcycleh instructions, and instead, I will add the start and loop label。

.global itof_clz
.global _start
_start:

    la t0, num
    lw a0, 12(t0)
    lw a1, 8(t0)
    jal itof_clz
    li t0,1
    li t1,2
    li t2,3

loop:
    j loop 

4.2. Test my RISC-V assembly

I'm writing my program in CPUtest, and here is my test program

class itof_clzTest extends AnyFlatSpec with ChiselScalatestTester {
  behavior.of("Single Cycle CPU")
  it should "convert integer to floating point" in {
    test(new TestTopModule("itof_clz.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
      for (i <- 1 to 500) {
        c.clock.step(1000) // Avoid timeout
        c.io.mem_debug_read_address.poke((i * 4).U) // Assume the converted result is stored in memory sequentially
      }

      c.io.regs_debug_read_address.poke(10.U)
      println(s"${c.io.regs_debug_read_data.peek()}")
      c.io.regs_debug_read_data.expect(1088462400.U)
      c.io.regs_debug_read_address.poke(11.U)
      println(s"${c.io.regs_debug_read_data.peek()}")
      c.io.regs_debug_read_data.expect(0.U)

  }
}
}

The main goal is to test whether my integer can be converted into IEEE-754 floating point.
run single test command

 $sbt "testOnly riscv.singlecycle.itof_clzTest"

so I run this test Program get Success output message

[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /home/chihen/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/chihen/ca2023-lab3/)
UInt<32>(1088462400)
UInt<32>(0)
[info] itof_clzTest:
[info] Single Cycle CPU
[info] - should convert integer to floating point
[info] Run completed in 18 seconds, 664 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 19 s, completed Nov 29, 2023, 8:36:22 PM
Input Output
0x84f2 UInt<32>(1088462400)

My output valid by IEEE754 converter is correct

4-3 Using Verilator to Run the Assembly and Visualizing using waveforms

Use the following command to generate the simulation executable file of the CPU

$make verilator
$./run-verilator.sh -instruction src/main/resources/itof_clz.asmbin -time 4000 -vcd itofclz01.vcd

Output:

-time 4000
-memory 1048576
-instruction src/main/resources/itof_clz.asmbin
[-------------------->] 100%

Waveform

Using an online RISC-V instruction encoder/decoder allows us to quickly understand the registers behind the instructions and easily determine their memory locations, enabling us to better observe the waveform variations

  • R-type

sub a3,a3,a2
Assembly =sub x13, x13, x12
Binary =0100 0000 1100 0110 1000 0110 1011 0011
Hexadecimal =0x40c686b3
ID stage
screenshot 2023-11-30 182748
io_reg_write_enable is used to indicate whether R-Type instructions should write to IO device registers

EX stage
screenshot 2023-11-30 184252
alu_op has successfully retrieved the value from the register and is ready to perform operations using it

Reg
screenshot 2023-11-30 184629
For this stage, after the clock enters the next phase, the values in the registers will undergo a change.

  • I-type
    lw a0,8(t0)
    Assembly =lw x10, 12(x5)
    Binary =0000 0000 1100 0010 1010 0101 0000 0011
    Hexadecimal =0x00c2a503

ID stage
screenshot 2023-11-30 185308
mem_read_enable and reg_write_enable have been set to extract data from memory addresses and prepare for writing into registers.
Ex stage
screenshot 2023-11-30 185728
We can obtain the address 00001308 read from the registers
Reg stage
screenshot 2023-11-30 190315
For this stage, after the clock enters the next phase, the values in the registers will undergo a change

  • S-type
    sw a0, 0(sp)

Assembly =sw x10, 0(x2)
Binary =0000 0000 1010 0001 0010 0000 0010 0011
Hexadecimal =0x00a12023

ID stage
screenshot 2023-11-30 190645
io_reg_write_enable is used to indicate whether S-Type instructions should write to IO device registers

EX stage
screenshot 2023-11-30 191010
We can observe that alu_op is determined by the changes in alu_op_source, which in turn affects the data in the register

5.Conculsion

Through this practical assignment, I have come to realize my own shortcomings and have learned a new programming language. Going through Lab 3 step by step to understand the architecture of a single-cycle CPU has given me a deeper understanding of the essence of computer architecture and its design. Perhaps in the future, there may be assignments related to GPU design that will allow us to delve even further into the implications and principles behind computer components. I also look forward to continuously learning through the guidance of our teacher and pushing myself to bridge the significant gap between myself and those who excel in the field.

6.Reference

Construct a single-cycle RISC-V CPU with Chisel
Single-Cycle Processor
Building a RISC-V Processor
Datapath Control
Chisel Breakdown 3