# Assignment3: single-cycle RISC-V CPU contributed by < [`GliAmanti`](https://github.com/GliAmanti) > ## Installation My OS: **``Ubuntu 22.04 LTS``** ### Prepare GTKWave for Evaluation ``` sudo apt install build-essential verilator gtkwave ``` ### Prepare sbt for Running Scala Code 1. Install [SDKMAN](https://sdkman.io/). ``` curl -s "https://get.sdkman.io" | bash ``` 2. Open a new terminal window and run the following command. ``` source "/home/cgvsl/.sdkman/bin/sdkman-init.sh" ``` 3. Install JDK and sbt by using [SDKMAN](https://sdkman.io/). ``` sdk install java 11.0.21-tem sdk install sbt ``` :::success Remember to repeat step **2** every time you open a new terminal to run sbt. ::: ## Describtion of [Hello World in Chisel](https://hackmd.io/@sysprog/r1mlr3I7p#Hello-World-in-Chisel) ```scala class Hello extends Module { val io = IO(new Bundle { val led = Output(UInt(1.W)) }) val CNT_MAX = (50000000 / 2 - 1).U; val cntReg = RegInit(0.U(32.W)) val blkReg = RegInit(0.U(1.W)) cntReg := cntReg + 1.U when(cntReg === CNT_MAX) { cntReg := 0.U blkReg := ~blkReg } io.led := blkReg } ``` * ``led`` is the output of ``Hello`` class. * ``CNT_MAX`` is the maximum of the counter value. * ``cntReg`` is a counter. * ``blkReg`` is the current state of the LED. ``cntReg`` will increase by 1 gradually. When ``cntReg`` equals ``CNT_MAX``, namely ``24999999``, it will be reset to ``0``. And ``blkReg`` will toggle its state. The state of the ``blkReg`` will be assigned to output ``led``. ## Complete Version of [MyCPU (Lab3)](https://hackmd.io/@sysprog/r1mlr3I7p#Implementation) We only have to fill the blanks in [``InstructionFetch.scala``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/InstructionFetch.scala), [``InstructionDecode.scala``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/InstructionDecode.scala), [``Execute.scala``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/Execute.scala) and [``CPU.scala``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/CPU.scala). Here is my [inplementation](https://github.com/GliAmanti/ComputerArchitecture_HW3/tree/main/src/main/scala/riscv/core) of lab3, which is forked from [ca2023-lab3](https://github.com/sysprog21/ca2023-lab3/tree/main). The following figure is RV32I datapath. ![datapath](https://hackmd.io/_uploads/Skc_nkWS6.png) ### How to Run 1. Get the repository. ``` git clone https://github.com/GliAmanti/ComputerArchitecture_HW3.git cd ComputerArchitecture_HW3 ``` 2. To simulate and run tests for this project, execute the following commands under the ``ComputerArchitecture_HW3`` directory. ``` sbt test ``` ::: info The output message will be: ``` [info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21) [info] loading settings for project computerarchitecture_hw3-build from plugins.sbt ... [info] loading project definition from /home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/) [info] compiling 1 Scala source to /home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/target/scala-2.13/test-classes ... [info] ByteAccessTest: [info] Single Cycle CPU [info] - should store and load a single byte [info] InstructionFetchTest: [info] InstructionFetch of Single Cycle CPU [info] - should fetch instruction [info] InstructionDecoderTest: [info] InstructionDecoder of Single Cycle CPU [info] - should produce correct control signal [info] ExecuteTest: [info] Execution of Single Cycle CPU [info] - should execute correctly [info] FibonacciTest: [info] Single Cycle CPU [info] - should recursively calculate Fibonacci(10) [info] QuicksortTest: [info] Single Cycle CPU [info] - should perform a quicksort on 10 numbers [info] RegisterFileTest: [info] Register File of Single Cycle CPU [info] - should read the written content [info] - should x0 always be zero [info] - should read the writing content [info] Run completed in 6 seconds, 952 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 7, aborted 0 [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 10 s, completed Nov 30, 2023, 1:24:28 PM ``` ::: 3. If you want to run a single test, such as running only ``InstructionDecoderTest``, execute the following command: ``` sbt "testOnly riscv.singlecycle.InstructionDecoderTest" ``` ::: info The output message will be: ``` [info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21) [info] loading settings for project computerarchitecture_hw3-build from plugins.sbt ... [info] loading project definition from /home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/) [info] InstructionDecoderTest: [info] InstructionDecoder of Single Cycle CPU [info] - should produce correct control signal [info] Run completed in 2 seconds, 509 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 3 s, completed Nov 30, 2023, 1:29:16 PM ``` ::: ### Description of Unit tests :::success If you see the icon :chart_with_upwards_trend:, please check the waveform for the details. ::: #### InstructionFetchTest This test verifies whether the [``InstructionFetch``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/InstructionFetch.scala) module bring ``PC`` to the right address. :::danger :warning: **Refrain from copying and pasting your solution directly into the HackMD note**. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases. ::: There are 2 cases: #### 1. No jump * ``jump_flag_id`` will be set to ``false``, then ``PC := PC + 4``. * :chart_with_upwards_trend: ``jump_flag_id`` is 0, so ``instruction_address`` change from ``1000`` to ``1004``. #### 2. Jump * ``jump_flag_id`` will be set to ``true``, then ``PC := jump_address_id``. * In this test, ``PC`` will jump to ``entry``, namely ``0x1000``. * :chart_with_upwards_trend: ``jump_flag_id`` is 1, so ``instruction_address`` change from ``1008`` to ``1000``. ##### Waveform ![image](https://hackmd.io/_uploads/rJCoCX8B6.png) #### InstructionDecoderTest This test verifies whether the [``InstructionDecoder``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/InstructionDecode.scala) module accurately distinguishs the **opcode**, passing the right data for each corresponding instruction. :::danger :warning: **Refrain from copying and pasting your solution directly into the HackMD note**. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases. ::: There are many cases but this test only verifies 3: #### 1. S-type * This kind of instruction adds the contents of ``rs1`` and ``simm12``, regarding the result as memory address, so: ``aluop1_source := ALUOp1Source.Register``. That is, choosing gate ``0``. ``aluop2_source := ALUOp2Source.Immediate``. That is, choosing gate ``1``. :::success According to the definition in [``InstructionDecoder``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/InstructionDecode.scala). ```scala object ALUOp1Source { val Register = 0.U(1.W) val InstructionAddress = 1.U(1.W) } object ALUOp2Source { val Register = 0.U(1.W) val Immediate = 1.U(1.W) } ``` ::: :::success [SW/SH/SB](https://ithelp.ithome.com.tw/articles/10268196) ``` sw/sh/sb rs2, rs1, simm12 ``` ::: * :chart_with_upwards_trend: We can distinguish the instruction ``00A02223`` as S-type by the last 7 bits in binary ``010 0011``. So ``memory_write_enable`` will be set to ``1``. ``reg1_read_address = 0`` and ``immediate = 4`` will be transported to the next stage. #### 2. lui * This instruction loads ``uimm20`` to upper 20 bits of ``rd``, and sets the rest of bits to 0, so: ``aluop1_source := ALUOp1Source.Register``. That is, choosing gate ``0``. ``aluop2_source := ALUOp2Source.Immediate``. That is, choosing gate ``1``. :::success [LUI (Load upper immediate)](https://ithelp.ithome.com.tw/articles/10268196) ``` lui rd, uimm20 ``` ::: #### 3. add * This instruction adds the contents of ``rs1`` and ``rs2``, so: ``aluop1_source := ALUOp1Source.Register``. That is, choosing gate ``0``. ``aluop2_source := ALUOp2Source.Register``. That is, choosing gate ``0``. :::success [ADD](https://ithelp.ithome.com.tw/articles/10268196) ``` add rd, rs1, rs2 ``` ::: ##### Waveform ![image](https://hackmd.io/_uploads/H1EfkEUr6.png) #### ExecuteTest This test verifies whether the [``Execute``](https://github.com/GliAmanti/ComputerArchitecture_HW3/tree/main/src/main/scala/riscv/core) module makes right decision regarding branch instruction. :::danger :warning: **Refrain from copying and pasting your solution directly into the HackMD note**. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases. ::: There are many cases but this test only verifies ``add`` and ``beq``. #### 1. add * ``ALU`` adds ``op1`` and ``op2``, according to the ``funct``. * In this test, ``op1`` is set to ``reg1_data``, ``op2`` is set to ``reg2_data``. Both of them are random integers. * ``jump_flag_id`` will be set to ``false``. #### 2. beq * ``Branch Comp.`` compares the contents of ``rs1`` and ``rs2``. ``ALU`` computes jump address, so: ``aluop1_source := ALUOp1Source.InstructionAddress``. That is, choosing gate ``1``. ``aluop2_source := ALUOp2Source.Immediate``. That is, choosing gate ``1``. :::success [BEQ](https://ithelp.ithome.com.tw/articles/10268196) ``` beq rs1, rs2, simm13 ``` ::: * ##### Equal * If ``reg1_data == reg2_data``, ``jump_flag_id`` will be set to ``true``. Then ``jump_address := immediate + instruction_address``. * In this test, ``jump_address := 2 + 2`` * :chart_with_upwards_trend: When ``reg1_data = 9`` and ``reg2_data = 9``, the equal condition is satisfied. So ``jump_flag`` is trigger. * ##### Not equal * If ``reg1_data != reg2_data``, ``jump_flag_id`` will be set to ``false``. Then ``jump_address := immediate + instruction_address``. * In this test, ``jump_address := 2 + 2`` * :chart_with_upwards_trend: When ``reg1_data = 17FE18F6`` and ``reg2_data = 15D9D5DD``, the equal condition is not satisfied. So ``jump_flag`` is not trigger. ##### Waveform ![image](https://hackmd.io/_uploads/Byyza7Lra.png) #### CPUTest This test verifies whether all components in **MyCPU** function properly. :::danger :warning: **Refrain from copying and pasting your solution directly into the HackMD note**. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases. ::: There are 3 test cases in CPUTest: #### 1. FibonacciTest * This class reads the file ``fibonacci.asmbin``, which calculates Fibonacci(10). * In this test, ``mem_debug_read_address`` will be set to ``4``, then ``mem_debug_read_data`` will get ``55``. #### 2. QuicksortTest * This class reads the file ``quicksort.asmbin``, which performs a Quick Sort on 10 numbers. * In this test, ``mem_debug_read_address`` will be set to ``4 * i``, then ``mem_debug_read_data`` will get ``i - 1``. #### 3. ByteAccessTest * This class reads the file ``sb.asmbin``, which stores and loads a single byte. * In test case 1, ``regs_debug_read_address`` will be set to ``5``, then ``regs_debug_read_data`` will get ``0xdeadbeef``. * In test case 2, ``regs_debug_read_address`` will be set to ``6``, then ``regs_debug_read_data`` will get ``0xef``. * In test case 3, ``regs_debug_read_address`` will be set to ``1``, then ``regs_debug_read_data`` will get ``0x15ef``. :::success :question: My guess :question: The variable inside the function ``withclock()`` will be implicit reset, so they won't show the waveform in gtkwave, such as ``reg_debug_address`` and ``reg_debug_data``. ::: ## Adaptation of Homework2 Here is my [adaptation](https://github.com/GliAmanti/ComputerArchitecture_HW3/tree/main/csrc) of hw2. ### How to Run To run hw2 with **MyCPU**, I do the following operations. 1. Remove the code related to ``rdcycle`` and ``rdcycleh`` in ``myHammingDist.S``. 2. Store the results to the memory address I test in ``CPUTest.scala``. ``` # base memory address to store the result li s9, 0x4 ...... # store the result for hw3 slli t1, s1, 2 add t0, s9, t1 sw a0, 0(t0) ``` 3. Add the corresponding test to ``CPUTest.scala``. ::: spoiler HammingTest ```scala class HammingTest extends AnyFlatSpec with ChiselScalatestTester { behavior.of("Single Cycle CPU") it should "caculate the hamming distance between two 64-bit integers" in { test(new TestTopModule("myHammingDist.asmbin")).withAnnotations(TestAnnotations.annos) { c => for (i <- 1 to 50) { c.clock.step(1000) c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout } c.io.mem_debug_read_address.poke(4.U) c.clock.step() c.io.mem_debug_read_data.expect(21.U) c.io.mem_debug_read_address.poke(8.U) c.clock.step() c.io.mem_debug_read_data.expect(63.U) c.io.mem_debug_read_address.poke(12.U) c.clock.step() c.io.mem_debug_read_data.expect(0.U) } } } ``` ::: ::: danger If you test the result with **memory address**, remember to check whether ``c.clock.step(1000)`` is in your loop. Otherwise, you will get the error message. ::: 4. Add ``myHammingDist.S`` to ``csrc`` directory. 5. Do some modification to the ``Makefile``, which is in ``csrc`` directory. ::: spoiler Makefile ```cmake CROSS_COMPILE ?= riscv-none-elf- ASFLAGS = -march=rv32i_zicsr -mabi=ilp32 CFLAGS = -O0 -Wall -march=rv32i_zicsr -mabi=ilp32 LDFLAGS = --oformat=elf32-littleriscv AS := $(CROSS_COMPILE)as CC := $(CROSS_COMPILE)gcc LD := $(CROSS_COMPILE)ld OBJCOPY := $(CROSS_COMPILE)objcopy %.o: %.S $(AS) -R $(ASFLAGS) -o $@ $< %.elf: %.S $(AS) -R $(ASFLAGS) -o $(@:.elf=.o) $< $(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o) %.elf: %.c init.o $(CC) $(CFLAGS) -c -o $(@:.elf=.o) $< $(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o) init.o %.asmbin: %.elf $(OBJCOPY) -O binary -j .text -j .data $< $@ BINS = \ fibonacci.asmbin \ hello.asmbin \ mmio.asmbin \ quicksort.asmbin \ sb.asmbin \ + myHammingDist.asmbin # Clear the .DEFAULT_GOAL special variable, so that the following turns # to the first target after .DEFAULT_GOAL is not set. .DEFAULT_GOAL := all: $(BINS) update: $(BINS) cp -f $(BINS) ../src/main/resources clean: $(RM) *.o *.elf *.asmbin ``` ::: 6. Generate ``myHammingDist.asmbin``. ``` make update ``` or ``` make clean make ``` 7. Run the test. ``` sbt "testOnly riscv.singlecycle.HammingTest" ``` ### [How to Analyze](https://hackmd.io/@sysprog/r1mlr3I7p#Waveform) 1. After the first run and every time you modify the Chisel code, you need to execute the following command in the project’s root directory to generate Verilog files. ``` make verilator ``` 2. Load ``myHammingDist.asmbin`` for simulating 2000 cycles, saving the simulation waveform to the ``myHammingDist.vcd``. ``` ./run-verilator.sh -instruction src/main/resources/myHammingDist.asmbin -time 1000 -vcd myHammingDist.vcd ``` ::: info The output message will be: ``` -time 1000 -memory 1048576 -instruction src/main/resources/myHammingDist.asmbin [-------------------->] 100% ``` ::: 3. Use GTKWave to view the output waveform file ``myHammingDist.vcd`` ``` gtkwave myHammingDist.vcd ``` ### Description of Waveform I take the instruction ``lw a0, 0(s2)`` for example and analyze how **MyCPU** operates the instruction in different stages. #### Instruction Fetch ![image](https://hackmd.io/_uploads/Sysop-8Hp.png) * Instruction ``00092503`` is ``lw a0, 0(s2)``. (Line 52 in my code.) * It is neither ``SB-type`` nor ``UJ-type`` instruction, so: * ``jump_flag_id = 0``, * ``PC = PC + 4``. ::: success :mag: Please check [here](https://luplab.gitlab.io/rvcodecjs/) for instruction conversion. ::: #### Instruction Decode ![image](https://hackmd.io/_uploads/BJLFQG8HT.png) * ``lw`` will load data from the memory address ``rs1 + imm`` to ``rd``, so: * ``memory_read_enable = 1``, * ``memory_write_enable = 0`` * ``reg_write_address = A`` (``a0`` = ``x10``) * ``reg_write_enable = 1`` * ``rs1 = 12``(``s2`` = ``x18``) * ``rd = A`` * ``immediate = 0`` #### Execute ![image](https://hackmd.io/_uploads/rkbAPfUBp.png) * According to [``ALUControl``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/ALUControl.scala) and [``ALU``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/ALU.scala), ``alu_funct = ALUFunctions.add``, namely ``1``. ```scala is(InstructionTypes.L) { io.alu_funct := ALUFunctions.add } ``` ```scala object ALUFunctions extends ChiselEnum { val zero, add, sub, sll, slt, xor, or, and, srl, sra, sltu = Value } ``` * ``aluop1_source = 0``, so ``op1 = reg1_data``. * ``aluop1_source = 1``, so ``op2 = immediate``. * ``ALU`` adds ``op1 = FFFFFFF4`` and ``op2 = 00000000``, getting the ``result = FFFFFFF4``. #### Memory Access ![image](https://hackmd.io/_uploads/Bk8yx7Ir6.png) * ``memory_read_enable = 1``, so ``Memory`` will output ``read_data = 00000000`` , which is corresponding to the ``memory address = FFFFFFF4``. #### Write Back ![image](https://hackmd.io/_uploads/B1tdU7Lrp.png) * According to [``WriteBack``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/WriteBack.scala) and [``InstructionDecode``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/InstructionDecode.scala), ``regs_write_source = 1``, so ``regs_write_data = memory_read_data``, namely ``00000000``. ```scala io.regs_write_data := MuxLookup( io.regs_write_source, io.alu_result, IndexedSeq( RegWriteSource.Memory -> io.memory_read_data, RegWriteSource.NextInstructionAddress -> (io.instruction_address + 4.U) ) ) ``` ```scala object RegWriteSource { val ALUResult = 0.U(2.W) val Memory = 1.U(2.W) val NextInstructionAddress = 3.U(2.W) } ``` * The result will be written back to ``write_address = A``.