Try   HackMD

Homework 3 single-cycle RISC-V CPU

contributed by < Hotmercury >

Setup

Learning jupyter
docker install

​​​​docker run -it --rm -p 8888:8888 ucbbar/chisel-bootcamp

git clone

​​​​git clone https://github.com/freechipsproject/chisel-bootcamp.git

vscode open

​​​​paste docker url
​​​​select scala

If i execute underlying code will error

val path = System.getProperty("user.dir") + "/source/load-ivy.sc"
interp.load.module(ammonite.ops.Path(java.nio.file.FileSystems.getDefault().getPath(path)))

ammonite.util.CompilationError: Failed to resolve ivy dependencies:/coursier_cache/.structure.lock (Permission denied)

Due to a Docker permission issue, we need to resolve it by either adjusting ownership using chown or elevating privileges using chmod.

docker exec -u 0 -it {docker_id} bash
chown -R bootcamp:bootcamp coursier_cache/

If we dont want to use token, find dockerfile copy last line

$ docker run -it --rm -p 8888:8888 ucbbar/chisel-bootcamp bash
$ jupyter notebook --no-browser --ip 0.0.0.0 --port 8888 --NotebookApp.token=''

Lab3

Lab3

Install the dependent packages

$ sudo apt install build-essential verilator gtkwave

Install sbt on Linux

sbt (Scala Build Tool), relies on Scala and need a JVM (Java Virtual Machine)

  • Local with sdk(software development kit manager)
# Install sdkman
$ curl -s "https://get.sdkman.io" | bash
$ source "$HOME/.sdkman/bin/sdkman-init.sh"

# Install Eclipse Temurin JDK 11
$ sdk install java 11.0.21-tem 
$ sdk install sbt

Add new PPA(Personal Package Archive) to system

$ sudo add-apt-repository -y ppa:apptainer/ppa
$ sudo apt update
$ sudo apt install -y apptainer

Additional tests : ubuntu 18 using Apptainer

$ singularity build ubuntu_18.04.sif docker://ubuntu:18.04
$ singularity shell ubuntu_18.04.sif

But we can't install, so need to make sandbox

$ singularity build --fakeroot --sandbox ubuntu_18.04 docker://ubuntu:18.04
$ singularity shell --fakeroot --writable ubuntu_18.04
$ apt-get update
$ apt-get install {package}

Not successfull
do it later

Fundamental concept

val : is derived from Scala : immutable
var : mutable

Basic Types

Types meaning
Bits Raw collection of bits
SInt Signed integer number
UInt Unsigned integer number
Bool Boolean

Bundles and Vecs

Chisel Bundles are used to represent groups of wires that have name fields, like struct in c.

class FIFIOInput extends Bundle{
    val ready = Bool(OUTPUT)
    val data = Bit(INPUT,32)
    val enq = Bool(INPUT)
}

Now we can create instances of FIFOInput

val jonsIO = new FIFOInput

Bundles are typically used to define the interface of modules. The Bundle "flip" operator is employed to create and "oppsite" Bundle concerning its direction : -> 'input to output'

Flip situation : two connect module have many smae input/output name

class MyFloat extends Bundle{
    val sign = Bool()
    val exponent = Bits(width = 8)
    val significant = Bits(width = 23)
}
val x = new MyFloat()
val xs = x.sign
class BigBundle extends Bundle{
    val myVec = Vec(5){ SInt(width = 23)} // Vector of 5 23-bit signed intergers
    val flag = Bool()
    val f = new MyFloat()
}

Note: Vec is not a memory array; it functions as a collection of wires (or registers).

Literals

Bits("ha")

Ports

val rdy = Bool(OUTPUT)
val in = new MyFloat().asInput

Modules

Modules are employed to establish hierarchy within the generated circuit

class Mux2 extends Module {
    val io = new Bundle{
        val select = Bits(width=1, dir=INPUT);
        val in0 = Bits(width=1, dir=INPUR);
        val in1 = Bits(width=1, dir=INPUT);
        val out = Bits(width=1, dir=OUTPUT);
    };
    io.out := (io.select & io.in1) | (~io.select & io.in0);
}

Combinational Logic:

val wire = Wire(UInt(8.W0))
val cireinit = WireInit(0.U(8.W))

positive-edge-triggered register Sequential Logic:

val reg = Reg(UInt(8.W))
val reginit = RegInit(0.U(8.W))

Hello World in Chisel

Base on what i have lerned so for

class Hello extends module{
    val io = new Bundle{
        val out = Bit(4.W)
    }
    val word = "hello world".B
    out := word
}

correct version

class Hello extends Module {
    val io = IO(new Bundle {
        val led = Output(UInt(1.W))
    })
    val CNT_MAX = (5000000 / 2 - 1).U;
    val cntReg = RegInit(0.U(32.W))
    val blkReg = RegInit(0.U(1.W))
    cntReg := cntReg + 1.U
    when(cntReg === CNT_MAX){
        cntReg := 0.U
        blk := ~blkReg
    }
    io.led := blkReg
}

Chisel Tutorial

Getting the Repository

$ git clone https://github.com/ucb-bar/chisel-tutorial.git
$ cd chisel-tutorial
$ git fetch origin
$ git checkout release

Testing Your System First make sure that you have sbt (the scala build tool) installed. See details in sbt.

MyCPU

image

First i want to know how to use sbt example to start a hello-word example.
And successfully ran the 'Hello World' program

object Hello {
  def main(args: Array[String]): Unit = {
    println("Hello World")
  }
}

run hello.scala

sbt run

Next, I proceeded to MyCPU, using $ sbt test and it come up with six errors. Therefore, I systematically addressed and corrected each of these errors. I used individual tests for this process."

IF

$ sbt "testOnly riscv.singlecycle.InstructionFetchTest"

testOnly
The testOnly task accepts a whitespace separated list of test names to run.
type
object : You cannot instantiate an object(no need to call new); you can simply directly reference it. It like to Java static classes.

How to solve the error

We need to handle program counter with two issues

  1. jump condition
  2. pc + 4

Code can be found in src/main/scala/riscv/core/InstructionFetch.scala.

image
image

another problem

instruction_read_data and instruction
how do they work correct

image

ID

Decode: Understanding the meaning of the instruction and reading register data.

$ sbt "testOnly riscv.singlecycle.InstructionDecoderTest"

we can find some Syntax-related chisel org

Fill

  • fill(num, bit stream) copy bit stream num times

MuxLookup

  • MuxLookup(idx, default)(Seq(0.U -> a, 1.U -> b))

instruction type

We can broswer riscv manual,Searching for underlying binary information can yield relevant results.

  • L type : b0000011 : load
  • I type : b0010011 : immidiate
  • S type : b0100011 : store
  • RM type : b0110011 : R type
  • B type : b1100011 : branch

Based on the diagram below, we can see that only lui discards rs1.

io.regs_reg1_read_address := Mux(opcode === Instructions.lui, 0.U(Parameters.PhysicalRegisterAddrWidth), rs1)

image

lab3

Realized that what we need to do is to complete the judgment of memory_read_enble and memory_write_enble.
So we can fix this lab.

exe

$ sbt "testOnly riscv.singlecycle.ExecuteTest"

We need to compelete alu_control input (base on func3 and func7, we can get instructionType like addi, slli and so on)

val alu_ctrl = Module(new ALUControl)

cpu

$ sbt "testOnly riscv.singlecycle.CPUTest"

This part is connecting the circuits of all modules.

sumup

After we fixed all error, we can get underlying message

$ sbt run

$ sbt run
[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project ca2023-lab3-build from plugins.sbt
[info] loading project definition from ca2023-lab3/project
[info] loading settings for project root from build.sbt
[info] set current project to mycpu (in build file:ca2023-lab3/)
[info] running board.verilator.VerilogGenerator
[success] Total time: 3 s, completed Nov 24, 2023, 2:56:00 PM

Program flow

Tracing all flow from c or asm to execution phase

all flow diagram

do later

sb.S

Run ByteAccessTest

sbt "testOnly riscv.singlecycle.ByteAccessTest"

We first run gtkwave

$ WRITE_VCD=1 sbt test  
$ gtkwave ./test_run_dir/Single_Cycle_CPU_should_store_and_load_a_single_byte/TestTopModule.vcd

We can find that TestTopModule provide some debug ports

val mem_debug_read_address  = Input(UInt(Parameters.AddrWidth))
val regs_debug_read_address = Input(UInt(Parameters.PhysicalRegisterAddrWidth))
val regs_debug_read_data    = Output(UInt(Parameters.DataWidth))
val mem_debug_read_data     = Output(UInt(Parameters.DataWidth))

And we can use poke, and according to the waveform diagram below, we can verify that we are indeed manipulating the corresponding registers.

c.io.regs_debug_read_address.poke(5.U) // t0
c.io.regs_debug_read_data.expect(0xdeadbeefL.U)

image

Next, we observe the waveform of MyCPU, starting from the instruction stage, that is when io_instruction_valid is 1, the operation begins.

$ make verilator
$ ./run-verilator.sh -instruction src/main/resources/sb.asmbin -time 2000 -vcd dump.vcd
$ gtkwave dump.vcd 

image

I noticed that the cycle frequencies in the above two waveforms are different. Therefore, I conducted a detailed analysis of the clock usage in ByteAccessTest. Upon entering the TestTopModule, I found additional cycles due to the use of withClock. I will explore the different purposes of these two clock in the future.

Now we see that ByteAccessTest use mem_debug_read_address and a for loop control, I want to know the meaning of this for loop.

for (i <- 1 to 50) {
    c.clock.step()
    c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}

We found that if set i range in 43, an error occurs.

[info] Single Cycle CPU
[info] - should store and load a single byte *** FAILED ***
[info] io_regs_debug_read_data=0 (0x0) did not equal expected=5615 (0x15ef) (lines in CPUTest.scala: 114, 104) (CPUTest.scala:114)
[info] *** 1 TEST FAILED ***
[error] Failed tests:
[error] riscv.singlecycle.ByteAccessTest
[error] (Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 10 s, completed Dec 1, 2023, 12:12:19 PM

So, we can conclude that at least 44 cycles are needed, and how can I conculate this number.

@0
00400513  // addi x10, x0, 4
@1
deadc2b7  // lui x5, -136484
@2
eef28293  // addi x5, x5, -273
@3
00550023  // sb x5, 0(x10)
@4
00052303  // lw x6, 0(x10)
@5
01500913  // addi x18, x0, 21
@6
012500a3  // sb x18, 1(x10)
@7
00052083  // lw x1, 0(x10)
@8
0000006f  // jal x0, 0
@9
00000013  // nop
@a
00000013
@b
00000013

Total 12 instructions need to be load to mem in TestModule first, so is 12 cycle, after all instruction load in mem io.load_finished := valid, now we can run MyCPU.

Base on the piece of code in TestTopModule we can see that every for original clock will cause one CPU clock.

  • 4 ticks to 1 CPU tick
  val CPU_clkdiv = RegInit(UInt(2.W), 0.U)
  val CPU_tick   = Wire(Bool())
  val CPU_next   = Wire(UInt(2.W))
  CPU_next   := Mux(CPU_clkdiv === 3.U, 0.U, CPU_clkdiv + 1.U)
  CPU_tick   := CPU_clkdiv === 0.U
  CPU_clkdiv := CPU_next

Now we can conculate the needed number of cycles, first we have 12 cycles to load instruction to mem, and the CPU needs to execute 8 necessary instructions (jmp and nop can be ignored as they won't affect the result).
So, we can calculate the required number of cycles.

12+8×4=44

Memory Access

Set mem_address_index as index of (MSB) bit 1 and 0, because log2Up will get log(4) is 2, and then sub 1.
Because the normal lw instruction fetches data in word alignment, when we use lb, lbu, and lh we can decide which part of the specified index to fetch. For example, when we use lb, we can determine which byte we want based on the index.

val mem_address_index = io.alu_result(log2Up(Parameters.WordSize) - 1, 0).asUInt

And alu_result is come from CPU phase

mem.io.alu_result := ex.io.mem_alu_result

And from EXE phase

io.mem_alu_result := alu.io.result

Finally get value of result in ALU.

image

TestTopModule

In-depth analysis of the TestTopModule
From the makefile, the {}.elf file is converted to {}.asmbin. Through readAsmBinary in instructionROM.scala, the .asmbin file is placed into an inputStream. After converting the format of instructions, they are placed in /verilog/verilator/filename.txt. Finally, loadMemoryFromFileInline is used to load the instructions into the memory.
Init flow

val mem = Module(new Memory(8192))
val instruction_rom = Module(new InstructionROM(exeFilename))
val rom_loader = Module(new ROMLoader(instruction_rom.capacity))

We can find that mem init e.g. 8192 x Vec(4,8)

val mem = SyncReadMem(capacity, Vec(Parameters.WordSize, UInt(Parameters.ByteWidth)))

File to InstructionROM in InstructionROM.scala

while (inputStream.read(arr) == 4) {
    val instBuf = ByteBuffer.wrap(arr)
    instBuf.order(ByteOrder.LITTLE_ENDIAN)
    val inst = BigInt(instBuf.getInt() & 0xffffffffL)
      instructions = instructions :+ inst
    }

This ROMLoader module is a Chisel module designed for loading ROM data into RAM.

In src/main/scala/peripheral/ROMLoader.scala

Run Hw2

c code

step

$ source "$HOME/.sdkman/bin/sdkman-init.sh"
$ source "$HOME/riscv-none-elf-gcc/setenv" 
  1. copy c code of hw2 to csrc and remove printf
  2. add sine.asbin to BINS of Makefile
  3. make update
  4. copy and fix test code
  5. $ sbt "testOnly riscv.singlecycle.SineTest"

But the following error occurred. I dont know what it meaning. I fix it by fix one line code but i still dont know what it is.

riscv-none-elf-ld -o sine.elf -T link.lds oformat=elf32-littleriscv sine.o init.o
riscv-none-elf-ld: sine.o: in function .L17': sine.c:(.text+0x45c): undefined reference to __mulsi3'
make: *** [Makefile:19: sine.elf] Error 1

- a = (a + (b & (-(a < 0)))) << 1;
+ uint32_t k = (a < 0);
+ a = (a + (b & (-k))) << 1;

And it will be successful

$ make verilator
$ ./run-verilator.sh -instruction src/main/resources/sine.asmbin -time 2000 -vcd dump.vcd
$ gtkwave dump.vcd

image

handwrite assemble

We followed the C code approach and removed all the ecall instructions, and it worked.