Contributed by Ray Huang (ι»ζε‘, coding-ray), 2023.
Primary references of this section:
- System Software Programming & jserv. (2023). Lab3: Construct a single-cycle RISC-V CPU with Chisel.
- System Software Programming & jserv. (2023). sysprog21/ca2023-lab3: Lab3: Construct a single-cycle CPU with Chisel | GitHub.
Operating system: Debian 12.2 (Bookworm)
ββββ# remove all conflicting packages
ββββexport CANDIDATES="docker.io docker-doc docker-compose podman-docker containerd runc"; \
ββββfor pkg in $CANDIDATES; do \
ββββ test ! -z "$(apt list --installed $pkg 2>&1 | sed -n 5p)" && \
ββββ sudo apt purge -y --quiet $pkg; \
ββββ test $? -ne 0 && \
ββββ echo Not installed: $pkg; \
ββββdone; \
ββββunset CANDIDATES
ββββ
ββββ# allow apt to use a repository over the HTTPS
ββββsudo apt update && sudo apt install -y ca-certificates curl gnupg
ββββ
ββββ# add Dockerβs official GPG key
ββββsudo install -m 0755 -d /etc/apt/keyrings && \
ββββcurl -fsSL https://download.docker.com/linux/debian/gpg | \
ββββsudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg && \
ββββsudo chmod a+r /etc/apt/keyrings/docker.gpg
ββββ
ββββ# set up the repository
ββββecho \
ββββ"deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
ββββ"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
ββββsudo tee /etc/apt/sources.list.d/docker.list > /dev/null
ββββ
ββββ# install the latest Docker engine
ββββsudo apt update && sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
ββββ
ββββ# add the current user to the `docker` group
ββββsudo usermod -aG docker $USER
ββββ
ββββ# activate the changes to groups
ββββnewgrp docker
ββββ
ββββ# check Docker version gives no error messages
ββββdocker version
ββββsudo apt install gtkwave
libjudydebian1 gtkwave
ββββ# download and extract the toolchain
ββββcd /tmp
ββββwget https://github.com/xpack-dev-tools/riscv-none-elf-gcc-xpack/releases/download/v13.2.0-2/xpack-riscv-none-elf-gcc-13.2.0-2-linux-x64.tar.gz
ββββtar zxf xpack-riscv-none-elf-gcc-13.2.0-2-linux-x64.tar.gz
ββββ
ββββ# create a version memo
ββββecho 13.2.0-2 > xpack-riscv-none-elf-gcc-13.2.0-2/version.txt
ββββ
ββββ# move the toolchain to ~/.local/share/, and add it to PATH
ββββmkdir -p ~/.local/share
ββββmv xpack-riscv-none-elf-gcc-13.2.0-2 ~/.local/share/riscv-none-elf-gcc
ββββecho "export PATH=\"\$HOME/.local/share/riscv-none-elf-gcc/bin:\$PATH\"" >> ~/.bashrc
ββββ. ~/.bashrc
ββββ
ββββ# make sure the toolchain is installed successfully
ββββriscv-none-elf-gcc -v
ββββ
ββββ# clean up
ββββrm -rf xpack-riscv-none-elf-gcc-13.2.0-2-linux-x64.tar.gz
ββββ# first run
ββββdocker run -d -it --name chisel-bootcamp -p 8888:8888 sysprog21/chisel-bootcamp
ββββ
ββββ# stop with progress saved
ββββdocker stop chisel-bootcamp
ββββ
ββββ# later run with progress restored
ββββdocker start chisel-bootcamp
docker exec -it ca-lab3 /bin/bash
.
root
user, with privilege to create files in the current directory (mounted). So, to delete these files owned by root
, you need to attach to the container first.
ββββ# first run
ββββgit clone https://github.com/coding-ray/2023-ca-lab-3 lab3
ββββcd lab3
ββββcp -r ~/.local/share/riscv-none-elf-gcc .
ββββdocker build -t ca-lab3 .
βdocker run -d -it --name ca-lab3 \
ββββ--mount type=bind,src="$(pwd)",dst=/app \
ββββca-lab3
ββββ
ββββ# stop with progress saved
ββββdocker stop ca-lab3
ββββ
ββββ# later run with progress restored
ββββdocker start ca-lab3
In this part, all the waveform is generated by the following command. Get rid of the prefix WRITE_VCD=1
to run test cases faster. (With the VCD, it takes 25 seconds; without the VCD, it takes 22 seconds on my old PC.)
WRITE_VCD=1 sbt test
In the instruction fetching (IF) part (src/main/scala/riscv/core/InstructionFetch.scala
), the missing part is to assign the program counter pc
with one of the following value.
pc
+ 4" if not to branch.jump_address_id
" (the address specified by the jump instruction) if to branch.If the input flag jump_flag_id
is set, it means "to branch".
In addition, the IF part does the following things.
instruction_read_data
, which is the instruction read from memory, to the output signal instruction
.instruction_valid
is not set, pc
= pc
implements a stall.instruction_address
is always the value of pc
.Observations from the following waveform:
reset
signal is set (pulled high), so registers (pc
) initialize with their default value (pc
= entry address = 0x1000).instruction_valid
is not set, pc
= pc
implements a stall.jump_flag
, jump_address_id
, instruction_valid
, io_instruction
) stay still. It is the setup time (before the triggering) and the hold time (after), to prevent undefined behaviors.instruction_valid
and jump_flag_id
are set, so pc
= jump_address_id
= 0x1000. Although it branches from 0x1000 to 0x1000, it looks like a stall.Observation from the following waveform: Not to branch (jump_flag_id
= 0), so pc
= pc
+ 4 (the instruction width is 4 bytes).
Observation from the moment in the following waveform: To branch, so pc
= jump_address_id
=0x1000.
The observations above show that the IF part works as designed, though the output signal instruction
is always 0 because the memory contains nothing.
In the instruction decoding (ID) part (src/main/scala/riscv/core/InstructionDecode.scala
), the missing code does the following two things.
lw lh lb lhu lbu
, whose opcode
is 0x3), the output flag memory_read_enable
will be true/1. Otherwise, false/0.sw sh sb
, whose opcode
is 0x23), the output flag memory_write_enable
will be true. Otherwise, false.Observations from the following waveform:
sw a0, 4(zero)
(0x00A02223), its lower 7 bits is opcode
= 0x3, so this instruction is S-type, memory_write_enable
is true.opcode
= 0x3, this test doesn't consider L-type instructions. As a result, memory_read
is always false.In the instruction execution (EXE) part (src/main/scala/riscv/core/Execute.scala
), the missing code does the following three things.
alu_funct
from the ALU control unit (alu_ctrl
) to the input funct
of the ALU (alu
).op1
of the ALU to the instruction_address
if it should be (present as aluop1_source
set high). Otherwise, set it to the content of the source register 1 (reg1_data
).op2
of the ALU to the immediate
if it should be (present as aluop2_source
set high). Otherwise, set it to the content of the source register 2 (reg2_data
).The following waveform shows the case that the op1
should be reg1_data
and that op2
should be reg2_data
.
The following waveform shows the case that the op1
should be instruction_address
and that op2
should be immediate
.
For this (byte loading and writing) and the following three tests (quick sorting, 10th Fibonacci number, palindrome checker), the external assembly code is moved to memory by the class TestTopModule
in the file src/test/scala/riscv/singlecycle/CPUTest.scala
. It loads the content of its argument, exeFilename
, in binary to the instruction ROM (src/main/scala/peripheral/InstructionROM.scala
).
There are some minor changes to the C code in homework 2 to make it work properly and testable in "MyCPU". The code is csrc/ispalindrome.c
.
is_palindrome()
, I change the return values for palindrome and non-palindrome from (1, 0) to (1, 2). Otherwise, since the initial values in the memory of MyCPU are 0, it is ambiguous to have them identical to non-palindrome results.
ββββif (a == b)
ββββ return 1; // palindrome
ββββelse
ββββ return 2; // not palindrome
is_palindrome()
returns, I save the result in a separate array. After all is_palinedrome()
finish, I write the results to a local fixed-size array, which is located in the stack of the program.is_palindrome()
finish, I write the results to the memory located in bytes 4 through 20 (4-20), which is located in the code section of the program.
ββββfor (int i = 1; i <= 4; i++) {
ββββ *(volatile int *) (i * 4) = results[i - 1];
ββββ}
is_palinedrome()
to the code section right after it returns, I would observe that memory located in bytes 8-20 is 0 entirely. I don't know the reason, but I have the workaround above.IsPalinedrome
in the file src/test/scala/riscv/singlecycle/CPUTest.scala
.The cursor in the first waveform is the moment that the program counter is off 0x1000. It is at around 2.7 ns.
The cursor in the last waveform is the moment that the program returns. It is at around 2716 ns.
Since we know the clock period is 2 ps, we know the program takes around 1,357k clock cycles to finish.