Try   HackMD

Assignment3: single-cycle RISC-V CPU

contributed by < GliAmanti >

Installation

My OS: Ubuntu 22.04 LTS

Prepare GTKWave for Evaluation

sudo apt install build-essential verilator gtkwave

Prepare sbt for Running Scala Code

  1. Install SDKMAN.
    ​​​​curl -s "https://get.sdkman.io" | bash
    
  2. Open a new terminal window and run the following command.
    ​​​​source "/home/cgvsl/.sdkman/bin/sdkman-init.sh"
    
  3. Install JDK and sbt by using SDKMAN.
    ​​​ sdk install java 11.0.21-tem
    ​​​ sdk install sbt
    

Remember to repeat step 2 every time you open a new terminal to run sbt.

Describtion of Hello World in Chisel

class Hello extends Module {
  val io = IO(new Bundle {
    val led = Output(UInt(1.W))
  })
  val CNT_MAX = (50000000 / 2 - 1).U;
  val cntReg  = RegInit(0.U(32.W))
  val blkReg  = RegInit(0.U(1.W))
  cntReg := cntReg + 1.U
  when(cntReg === CNT_MAX) {
    cntReg := 0.U
    blkReg := ~blkReg
  }
  io.led := blkReg
}
  • led is the output of Hello class.
  • CNT_MAX is the maximum of the counter value.
  • cntReg is a counter.
  • blkReg is the current state of the LED.

cntReg will increase by 1 gradually. When cntReg equals CNT_MAX, namely 24999999, it will be reset to 0. And blkReg will toggle its state. The state of the blkReg will be assigned to output led.

Complete Version of MyCPU (Lab3)

We only have to fill the blanks in InstructionFetch.scala, InstructionDecode.scala, Execute.scala and CPU.scala.

Here is my inplementation of lab3, which is forked from ca2023-lab3.

The following figure is RV32I datapath.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

How to Run

  1. Get the repository.
    ​​​​git clone https://github.com/GliAmanti/ComputerArchitecture_HW3.git
    ​​​​cd ComputerArchitecture_HW3
    
  2. To simulate and run tests for this project, execute the following commands under the ComputerArchitecture_HW3 directory.
    ​​​​sbt test
    

    The output message will be:

    ​​​​[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
    ​​​​[info] loading settings for project computerarchitecture_hw3-build from plugins.sbt ...
    ​​​​[info] loading project definition from /home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/project
    ​​​​[info] loading settings for project root from build.sbt ...
    ​​​​[info] set current project to mycpu (in build file:/home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/)
    ​​​​[info] compiling 1 Scala source to /home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/target/scala-2.13/test-classes ...
    ​​​​[info] ByteAccessTest:
    ​​​​[info] Single Cycle CPU
    ​​​​[info] - should store and load a single byte
    ​​​​[info] InstructionFetchTest:
    ​​​​[info] InstructionFetch of Single Cycle CPU
    ​​​​[info] - should fetch instruction
    ​​​​[info] InstructionDecoderTest:
    ​​​​[info] InstructionDecoder of Single Cycle CPU
    ​​​​[info] - should produce correct control signal
    ​​​​[info] ExecuteTest:
    ​​​​[info] Execution of Single Cycle CPU
    ​​​​[info] - should execute correctly
    ​​​​[info] FibonacciTest:
    ​​​​[info] Single Cycle CPU
    ​​​​[info] - should recursively calculate Fibonacci(10)
    ​​​​[info] QuicksortTest:
    ​​​​[info] Single Cycle CPU
    ​​​​[info] - should perform a quicksort on 10 numbers
    ​​​​[info] RegisterFileTest:
    ​​​​[info] Register File of Single Cycle CPU
    ​​​​[info] - should read the written content
    ​​​​[info] - should x0 always be zero
    ​​​​[info] - should read the writing content
    ​​​​[info] Run completed in 6 seconds, 952 milliseconds.
    ​​​​[info] Total number of tests run: 9
    ​​​​[info] Suites: completed 7, aborted 0
    ​​​​[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
    ​​​​[info] All tests passed.
    ​​​​[success] Total time: 10 s, completed Nov 30, 2023, 1:24:28 PM
    
  3. If you want to run a single test, such as running only InstructionDecoderTest, execute the following command:
    ​​​​sbt "testOnly riscv.singlecycle.InstructionDecoderTest"
    

    The output message will be:

    ​​​​[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
    ​​​​[info] loading settings for project computerarchitecture_hw3-build from plugins.sbt ...
    ​​​​[info] loading project definition from /home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/project
    ​​​​[info] loading settings for project root from build.sbt ...
    ​​​​[info] set current project to mycpu (in build file:/home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/)
    ​​​​[info] InstructionDecoderTest:
    ​​​​[info] InstructionDecoder of Single Cycle CPU
    ​​​​[info] - should produce correct control signal
    ​​​​[info] Run completed in 2 seconds, 509 milliseconds.
    ​​​​[info] Total number of tests run: 1
    ​​​​[info] Suites: completed 1, aborted 0
    ​​​​[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
    ​​​​[info] All tests passed.
    ​​​​[success] Total time: 3 s, completed Nov 30, 2023, 1:29:16 PM
    

Description of Unit tests

If you see the icon

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
, please check the waveform for the details.

InstructionFetchTest

This test verifies whether the InstructionFetch module bring PC to the right address.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Refrain from copying and pasting your solution directly into the HackMD note. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases.

There are 2 cases:

1. No jump

  • jump_flag_id will be set to false, then PC := PC + 4.
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    jump_flag_id is 0, so instruction_address change from 1000 to 1004.

2. Jump

  • jump_flag_id will be set to true, then PC := jump_address_id.
  • In this test, PC will jump to entry, namely 0x1000.
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    jump_flag_id is 1, so instruction_address change from 1008 to 1000.
Waveform

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

InstructionDecoderTest

This test verifies whether the InstructionDecoder module accurately distinguishs the opcode, passing the right data for each corresponding instruction.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Refrain from copying and pasting your solution directly into the HackMD note. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases.

There are many cases but this test only verifies 3:

1. S-type

  • This kind of instruction adds the contents of rs1 and simm12, regarding the result as memory address, so:
    aluop1_source := ALUOp1Source.Register. That is, choosing gate 0.
    aluop2_source := ALUOp2Source.Immediate. That is, choosing gate 1.

    According to the definition in InstructionDecoder.

    ​​​​object ALUOp1Source {
    ​​​​  val Register           = 0.U(1.W)
    ​​​​  val InstructionAddress = 1.U(1.W)
    ​​​​}
    
    ​​​​object ALUOp2Source {
    ​​​​  val Register  = 0.U(1.W)
    ​​​​  val Immediate = 1.U(1.W)
    ​​​​}
    

    SW/SH/SB

    ​​​​sw/sh/sb rs2, rs1, simm12
    
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    We can distinguish the instruction 00A02223 as S-type by the last 7 bits in binary 010 0011. So memory_write_enable will be set to 1. reg1_read_address = 0 and immediate = 4 will be transported to the next stage.

2. lui

  • This instruction loads uimm20 to upper 20 bits of rd, and sets the rest of bits to 0, so:
    aluop1_source := ALUOp1Source.Register. That is, choosing gate 0.
    aluop2_source := ALUOp2Source.Immediate. That is, choosing gate 1.

    LUI (Load upper immediate)

    ​​​​lui rd, uimm20
    

3. add

  • This instruction adds the contents of rs1 and rs2, so:
    aluop1_source := ALUOp1Source.Register. That is, choosing gate 0.
    aluop2_source := ALUOp2Source.Register. That is, choosing gate 0.

    ADD

    ​​​​add rd, rs1, rs2
    
Waveform

image

ExecuteTest

This test verifies whether the Execute module makes right decision regarding branch instruction.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Refrain from copying and pasting your solution directly into the HackMD note. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases.

There are many cases but this test only verifies add and beq.

1. add

  • ALU adds op1 and op2, according to the funct.
  • In this test, op1 is set to reg1_data, op2 is set to reg2_data. Both of them are random integers.
  • jump_flag_id will be set to false.

2. beq

  • Branch Comp. compares the contents of rs1 and rs2.
    ALU computes jump address, so:
    aluop1_source := ALUOp1Source.InstructionAddress. That is, choosing gate 1.
    aluop2_source := ALUOp2Source.Immediate. That is, choosing gate 1.

    BEQ

    ​​​​beq rs1, rs2, simm13
    
  • Equal
    • If reg1_data == reg2_data, jump_flag_id will be set to true.
      Then jump_address := immediate + instruction_address.
    • In this test, jump_address := 2 + 2
    • Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →
      When reg1_data = 9 and reg2_data = 9, the equal condition is satisfied. So jump_flag is trigger.
  • Not equal
    • If reg1_data != reg2_data, jump_flag_id will be set to false.
      Then jump_address := immediate + instruction_address.
    • In this test, jump_address := 2 + 2
    • Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →
      When reg1_data = 17FE18F6 and reg2_data = 15D9D5DD, the equal condition is not satisfied. So jump_flag is not trigger.
Waveform

image

CPUTest

This test verifies whether all components in MyCPU function properly.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Refrain from copying and pasting your solution directly into the HackMD note. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases.

There are 3 test cases in CPUTest:

1. FibonacciTest

  • This class reads the file fibonacci.asmbin, which calculates Fibonacci(10).
  • In this test, mem_debug_read_address will be set to 4, then mem_debug_read_data will get 55.

2. QuicksortTest

  • This class reads the file quicksort.asmbin, which performs a Quick Sort on 10 numbers.
  • In this test, mem_debug_read_address will be set to 4 * i, then mem_debug_read_data will get i - 1.

3. ByteAccessTest

  • This class reads the file sb.asmbin, which stores and loads a single byte.
  • In test case 1, regs_debug_read_address will be set to 5, then regs_debug_read_data will get 0xdeadbeef.
  • In test case 2, regs_debug_read_address will be set to 6, then regs_debug_read_data will get 0xef.
  • In test case 3, regs_debug_read_address will be set to 1, then regs_debug_read_data will get 0x15ef.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
My guess
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

The variable inside the function withclock() will be implicit reset, so they won't show the waveform in gtkwave, such as reg_debug_address and reg_debug_data.

Adaptation of Homework2

Here is my adaptation of hw2.

How to Run

To run hw2 with MyCPU, I do the following operations.

  1. Remove the code related to rdcycle and rdcycleh in myHammingDist.S.

  2. Store the results to the memory address I test in CPUTest.scala.

    ​​​​# base memory address to store the result
    ​​​​li s9, 0x4
    ​​​​
    ​​​​......
    ​​​​
    ​​​​ # store the result for hw3
    ​​​​slli t1, s1, 2
    ​​​​add t0, s9, t1
    ​​​​sw a0, 0(t0)    
    
  3. Add the corresponding test to CPUTest.scala.

    HammingTest
    ​​​​class HammingTest extends AnyFlatSpec with ChiselScalatestTester {
    ​​​​  behavior.of("Single Cycle CPU")
    ​​​​  it should "caculate the hamming distance between two 64-bit integers" in {
    ​​​​    test(new TestTopModule("myHammingDist.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
    ​​​​      for (i <- 1 to 50) {
    ​​​​        c.clock.step(1000)
    ​​​​        c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
    ​​​​      }
    
    ​​​​      c.io.mem_debug_read_address.poke(4.U)
    ​​​​      c.clock.step()
    ​​​​      c.io.mem_debug_read_data.expect(21.U)
    
    ​​​​      c.io.mem_debug_read_address.poke(8.U) 
    ​​​​      c.clock.step()
    ​​​​      c.io.mem_debug_read_data.expect(63.U)
    
    ​​​​      c.io.mem_debug_read_address.poke(12.U) 
    ​​​​      c.clock.step()
    ​​​​      c.io.mem_debug_read_data.expect(0.U)
    ​​​​    }
    ​​​​  }
    ​​​​}
    

    If you test the result with memory address, remember to check whether c.clock.step(1000) is in your loop. Otherwise, you will get the error message.

  4. Add myHammingDist.S to csrc directory.

  5. Do some modification to the Makefile, which is in csrc directory.

    Makefile
    ​​​​CROSS_COMPILE ?= riscv-none-elf-
    
    ​​​​ASFLAGS = -march=rv32i_zicsr -mabi=ilp32
    ​​​​CFLAGS = -O0 -Wall -march=rv32i_zicsr -mabi=ilp32
    ​​​​LDFLAGS = --oformat=elf32-littleriscv
    
    ​​​​AS := $(CROSS_COMPILE)as
    ​​​​CC := $(CROSS_COMPILE)gcc
    ​​​​LD := $(CROSS_COMPILE)ld
    ​​​​OBJCOPY := $(CROSS_COMPILE)objcopy
    
    ​​​​%.o: %.S
    ​​​​    $(AS) -R $(ASFLAGS) -o $@ $<
    ​​​​%.elf: %.S
    ​​​​    $(AS) -R $(ASFLAGS) -o $(@:.elf=.o) $<
    ​​​​    $(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o)
    ​​​​%.elf: %.c init.o
    ​​​​    $(CC) $(CFLAGS) -c -o $(@:.elf=.o) $<
    ​​​​    $(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o) init.o
    
    ​​​​%.asmbin: %.elf
    ​​​​    $(OBJCOPY) -O binary -j .text -j .data $< $@
    
    ​​​​BINS = \
    ​​​​    fibonacci.asmbin \
    ​​​​    hello.asmbin \
    ​​​​    mmio.asmbin \
    ​​​​    quicksort.asmbin \
    ​​​​    sb.asmbin \
    ​​​​+    myHammingDist.asmbin 
    
    ​​​​# Clear the .DEFAULT_GOAL special variable, so that the following turns
    ​​​​# to the first target after .DEFAULT_GOAL is not set.
    ​​​​.DEFAULT_GOAL :=
    
    ​​​​all: $(BINS)
    
    ​​​​update: $(BINS)
    ​​​​    cp -f $(BINS) ../src/main/resources
    
    ​​​​clean:
    ​​​​    $(RM) *.o *.elf *.asmbin
    
  6. Generate myHammingDist.asmbin.

    ​​​​make update
    

    or

    ​​​​make clean
    ​​​​make
    
  7. Run the test.

    ​​​​sbt "testOnly riscv.singlecycle.HammingTest"
    

How to Analyze

  1. After the first run and every time you modify the Chisel code, you need to execute the following command in the project’s root directory to generate Verilog files.

    ​​​​make verilator
    
  2. Load myHammingDist.asmbin for simulating 2000 cycles, saving the simulation waveform to the myHammingDist.vcd.

    ​​​​./run-verilator.sh -instruction src/main/resources/myHammingDist.asmbin -time 1000 -vcd myHammingDist.vcd
    

    The output message will be:

    ​​​​-time 1000
    ​​​​-memory 1048576
    ​​​​-instruction src/main/resources/myHammingDist.asmbin
    ​​​​[-------------------->] 100%
    
  3. Use GTKWave to view the output waveform file myHammingDist.vcd

    ​​​​gtkwave myHammingDist.vcd
    

Description of Waveform

I take the instruction lw a0, 0(s2) for example and analyze how MyCPU operates the instruction in different stages.

Instruction Fetch

image

  • Instruction 00092503 is lw a0, 0(s2). (Line 52 in my code.)
  • It is neither SB-type nor UJ-type instruction, so:
    • jump_flag_id = 0,
    • PC = PC + 4.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Please check here for instruction conversion.

Instruction Decode

image

  • lw will load data from the memory address rs1 + imm to rd, so:
    • memory_read_enable = 1,
    • memory_write_enable = 0
    • reg_write_address = A (a0 = x10)
    • reg_write_enable = 1
    • rs1 = 12(s2 = x18)
    • rd = A
    • immediate = 0

Execute

image

  • According to ALUControl and ALU, alu_funct = ALUFunctions.add, namely 1.
    ​​​​is(InstructionTypes.L) {
    ​​​​io.alu_funct := ALUFunctions.add
    ​​​​}
    
    ​​​​object ALUFunctions extends ChiselEnum {
    ​​​​  val zero, add, sub, sll, slt, xor, or, and, srl, sra, sltu = Value
    ​​​​}    
    
  • aluop1_source = 0, so op1 = reg1_data.
  • aluop1_source = 1, so op2 = immediate.
  • ALU adds op1 = FFFFFFF4 and op2 = 00000000, getting the result = FFFFFFF4.

Memory Access

image

  • memory_read_enable = 1, so Memory will output read_data = 00000000 , which is corresponding to the memory address = FFFFFFF4.

Write Back

image

  • According to WriteBack and InstructionDecode, regs_write_source = 1, so regs_write_data = memory_read_data, namely 00000000.
    ​​​​ io.regs_write_data := MuxLookup(
    ​​​​    io.regs_write_source,
    ​​​​    io.alu_result,
    ​​​​    IndexedSeq(
    ​​​​      RegWriteSource.Memory                 -> io.memory_read_data,
    ​​​​      RegWriteSource.NextInstructionAddress -> (io.instruction_address + 4.U)
    ​​​​    )
    ​​​​  )
    
    ​​​​object RegWriteSource {
    ​​​​  val ALUResult = 0.U(2.W)
    ​​​​  val Memory    = 1.U(2.W)
    ​​​​  val NextInstructionAddress = 3.U(2.W)
    ​​​​}
    
  • The result will be written back to write_address = A.