# Asssigment3: SoftCPU contributed by < `dck9661` > ## Modify the assembly programs for Reindeer Simulation ### Analyze In Reindeer github, we can only execute the elf file provided by [Reindeer](https://github.com/PulseRain/Reindeer/tree/master/sim/compliance). If we want to test our testbench we need to compile the assembly code by ourselves. So we need to analyze the assembly code provided by [riscv-compliance](https://github.com/riscv/riscv-compliance), then we can know how to compile our code to run with Reindeer verilator. Watch [I-ADD-01.S](https://github.com/riscv/riscv-compliance/blob/master/riscv-test-suite/rv32i/src/I-ADD-01.S) as example. In this code, we need to know four things: 1. This assembly code use many macro from different header files, which are compliance_test.h, compliance_io.h, test_macros.h. 2. It use register x0 to x32 instead of ABI. (ra,sp,t1,....) 3. Input data section to give initial global data. 4. Output data section can see the result is correct or not. ```cpp= #include "compliance_test.h" #include "compliance_io.h" #include "test_macros.h" # Test Virtual Machine (TVM) used by program. RV_COMPLIANCE_RV32M # Test code region RV_COMPLIANCE_CODE_BEGIN RVTEST_IO_INIT RVTEST_IO_ASSERT_GPR_EQ(x31, x0, 0x00000000) RVTEST_IO_WRITE_STR(x31, "# Test Begin\n") # --------------------------------------------------------------------------------------------- RVTEST_IO_WRITE_STR(x31, "# Test part A1 - general test of value 0 with 0, 1, -1, MIN, MAX immediate values\n"); # Addresses for test data and results la x1, test_A1_data la x2, test_A1_res # Load testdata lw x3, 0(x1) # Test addi x4, x3, 1 addi x5, x3, 0x7FF addi x6, x3, 0xFFFFFFFF addi x7, x3, 0 addi x8, x3, 0xFFFFF800 # Store results sw x3, 0(x2) sw x4, 4(x2) sw x5, 8(x2) sw x6, 12(x2) sw x7, 16(x2) sw x8, 20(x2) // // Assert // RVTEST_IO_CHECK() RVTEST_IO_ASSERT_GPR_EQ(x2, x3, 0x00000111) RVTEST_IO_ASSERT_GPR_EQ(x2, x4, 0x00000001) RVTEST_IO_ASSERT_GPR_EQ(x2, x5, 0x000007FF) RVTEST_IO_ASSERT_GPR_EQ(x2, x6, 0xFFFFFFFF) RVTEST_IO_ASSERT_GPR_EQ(x2, x7, 0x00000000) RVTEST_IO_ASSERT_GPR_EQ(x2, x8, 0xFFFFF800) RVTEST_IO_WRITE_STR(x31, "# Test part A1 - Complete\n"); # --------------------------------------------------------------------------------------------- # HALT RV_COMPLIANCE_HALT RV_COMPLIANCE_CODE_END # Input data section. .data .align 4 test_A1_data: .word 0 test_A2_data: .word 1 test_A3_data: .word -1 test_A4_data: .word 0x7FFFFFFF test_A5_data: .word 0x80000000 test_B_data: .word 0x0000ABCD test_C_data: .word 0x12345678 test_D_data: .word 0xFEDCBA98 test_E_data: .word 0x36925814 # Output data section. RV_COMPLIANCE_DATA_BEGIN .align 4 test_A1_res: .fill 6, 4, -1 test_A2_res: .fill 6, 4, -1 test_A3_res: .fill 6, 4, -1 test_A4_res: .fill 6, 4, -1 test_A5_res: .fill 6, 4, -1 test_B_res: .fill 7, 4, -1 test_C_res: .fill 1, 4, -1 test_D_res: .fill 2, 4, -1 test_E_res: .fill 4, 4, -1 RV_COMPLIANCE_DATA_END ``` When we know how the above assembly works, next step we need a linker script to match the Reindeer simulation testbench. ```cpp= OUTPUT_ARCH( "riscv" ) ENTRY(_start) SECTIONS { . = 0x00000000; .text.trap : { *(.text.trap) } . = 0x80000000; .text.init : { *(.text.init) } . = ALIGN(0x1000); .tohost : { *(.tohost) } . = ALIGN(0x1000); .text : { *(.text) } . = ALIGN(0x1000); .data : { *(.data) } .data.string : { *(.data.string)} .bss : { *(.bss) } _end = .; } ``` Final step we write a Makefile to compile assembly code. ```cpp= RISCV_PREFIX ?= riscv32-unknown-elf- RISCV_GCC ?= $(RISCV_PREFIX)gcc RISCV_OBJDUMP ?= $(RISCV_PREFIX)objdump RISCV_GCC_OPTS ?= -static -mcmodel=medany -fvisibility=hidden -nostdlib -nostartfiles ROOTDIR ?= ${your_path}/riscv-compliance TARGETDIR ?= $(ROOTDIR)/riscv-target RISCV_TARGET ?= riscvOVPsim APP_SRC = I-ADDI-01.S EXE = I-ADDI-01.elf all: $(RISCV_GCC) $(APP_SRC) $(RISCV_GCC_OPTS) \ -I$(ROOTDIR)/riscv-test-env/ \ -I$(ROOTDIR)/riscv-test-env/p/ \ -I$(TARGETDIR)/$(RISCV_TARGET)/ \ -T$(ROOTDIR)/riscv-test-env/p/link.ld \ -o $(EXE) ``` ### Write my own assembly code (bubble sort) ```cpp= # RISC-V Compliance Test Bubble_sort.S #include "compliance_test.h" #include "compliance_io.h" #include "test_macros.h" # Test Virtual Machine (TVM) used by program. RV_COMPLIANCE_RV32M # Test code region. RV_COMPLIANCE_CODE_BEGIN RVTEST_IO_INIT RVTEST_IO_ASSERT_GPR_EQ(x31, x0, 0x00000000) RVTEST_IO_WRITE_STR(x31, "# Test Begin\n") # Addresses for test data and results la x1, test_A1_data la x2, test_A1_res # Load testdata lw x3, 0(x1) # Register initialization addi x3,x2,-32 li x4, 2 sw x4, 12(x3) li x4, 1 sw x4, 16(x3) li x4, 5 sw x4, 20(x3) li x4, 4 sw x4, 24(x3) li x4, 3 sw x4, 28(x3) li x5, 4 // i = 4 4: li x6, 0 //i = 0 addi x4, x3, 12 //x4 = 12(x3) bge x6, x5, 1f 3: lw x7, 0(x4) //first element lw x8, 4(x4) //second element addi x6, x6,1 bge x8, x7, 2f sw x8, 0(x4) sw x7, 4(x4) 2: addi x4, x4, 4 //next element address blt x6, x5, 3b 1: addi x5, x5, -1 bnez x5, 4b lw x6, 28(x3) lw x5, 24(x3) lw x4, 20(x3) lw x7, 16(x3) lw x8, 12(x3) sw x6, 0(x2) sw x5, 4(x2) sw x4, 8(x2) sw x7, 12(x2) sw x8, 16(x2) addi x3, x3, 32 # --------------------------------------------------------------------------------------------- # HALT RV_COMPLIANCE_HALT RV_COMPLIANCE_CODE_END # Input data section. .data test_A1_data: .word 0 test_A2_data: .word 1 test_A3_data: .word -1 test_A4_data: .word 0x7FFFFFFF test_A5_data: .word 0x80000000 test_B_data: .word 0x0000ABCD test_C_data: .word 0x12345678 test_D_data: .word 0xFEDCBA98 test_E_data: .word 0x36925814 # Output data section. RV_COMPLIANCE_DATA_BEGIN test_A1_res: .fill 6, 4, -1 RV_COMPLIANCE_DATA_END ``` ### Result: run on the Reindeer Simulator Write your own golden data in reference. ```cpp= ============================================================= === PulseRain Technology, RISC-V RV32IM Test Bench ============================================================= elf file : ../compliance/Bubble_sort.elf reference : ../compliance/references/Bubble_sort.reference_output start address = 0x80000000 begin signature address = 0x80002030 end signature address = 0x80002050 ``` Instructions sections ```cpp= Loading section .text.init ... 1c4 bytes, LMA = 0x80000000 80000000 04c0006f 80000004 34202f73 80000008 00800f93 8000000c 03ff0a63 80000010 00900f93 80000014 03ff0663 80000018 00b00f93 8000001c 03ff0263 80000020 80000f17 80000024 fe0f0f13 80000028 000f0463 8000002c 000f0067 80000030 34202f73 80000034 000f5463 80000038 0040006f 8000003c 5391e193 80000040 00001f17 80000044 fc3f2023 . . . . ``` Initial global data section. ```cpp= Loading section .data ... 516 bytes, LMA = 0x80002000 80002000 00000000 80002004 00000001 80002008 ffffffff 8000200c 7fffffff 80002010 80000000 80002014 0000abcd 80002018 12345678 8000201c fedcba98 80002020 36925814 80002024 00000000 80002028 00000000 8000202c 00000000 80002030 ffffffff 80002034 ffffffff 80002038 ffffffff 8000203c ffffffff 80002040 ffffffff ``` Check the sorting is correct. ```cpp= ========> Matching signature ... 80002030 00000005 PASS 80002034 00000004 PASS 80002038 00000003 PASS 8000203c 00000002 PASS 80002040 00000001 PASS ======> Signature ALL MATCH!!! ============================================================= Simulation exit ../compliance/Bubble_sort.elf Wave trace Bubble_sort.vcd ============================================================= ``` ## Explain how your program run in Reindeer Simulation ### Check the hierarchy about HDL code ```cpp= HDL |-- Reindeer.v |-- PLL (clock control) |-- MCU (PulseRain_RV2T_MCU) |-- OCD (debugger) |-- MCU |-- UART_TX.v |-- port_ram.v |-- PulseRain_processor_core |-- RV2T_controller |-- RV2T_CSR |-- RV2T_data_access |-- RV2T_execution |-- RV2T_fetch_inst |-- RV2T_inst_decode |-- RV2T_machine_timer |-- RV2T_mm_reg |-- RV2T_reg_file |-- OCD |-- debug_coprocessor_wrapper|-- debug_coprocessor |-- debug_reply |-- debug_UART ``` ### Explain the instuction run in Reindeer Simulation I choose one load instruction to illustrate the result because we can see fetch, decode, read register, execution, read from memory, write to the register from a load instruction. Now we focus on PC = 0x8000278 ,instruction = 0x0000ab83. we know it will calculate the addr using ra+0. In this case, ra's data is 0x80002010 from PC = 0x8000026c, so we expect that lw instruction will access memory from address 0x80002010 and write the data to s7 register in the end. ```cpp= 80000268: 00002097 auipc ra,0x2 8000026c: da808093 addi ra,ra,-600 # 80002010 <test_A5_data> 80000270: 00002117 auipc sp,0x2 80000274: e2010113 addi sp,sp,-480 # 80002090 <test_A5_res> 80000278: 0000ab83 lw s7,0(ra) 8000027c: 00000c13 li s8,0 80000280: 00100c93 li s9,1 80000284: fff00d13 li s10,-1 80002010 <test_A5_data>: 80002010: 0000 unimp 80002012: 8000 0x8000 ``` The waveform show the lw progress. * In Fetch stage: PC is 0x8000278 so instruction get 0x0000ab83. * In decode stage: we can see rs1 = 1 because ra register equal to x1 register, and the rd register is 0x17 because s7 register is equal to x23 register. * In execute stage: we can see mem_access_addr is 0x8000_2010 which is ra + 0. * In memory stage: word_out signal is 0x8000_0000 because in the beginning memory[ra] store test_A5_data which is 0x8000_0000, so memory stage read this data out. * In write back stage: we can see write_addr = 0x17, and write_data_in = 0x8000_0000. and next cycle in register file we can see mem[23] have data 0x8000_0000. ![](https://i.imgur.com/RjHHqFS.png) ## Explain how Reindeer works with Verilator 1. Reset the CPU, put it into hold state ```cpp= uut->reset_n = 0; // Set some inputs uut->sync_reset = 0; std::cout << "\n=============> reset..." << "\n"; uut->sync_reset = 0; uut->ocd_reg_we = 0; uut->ocd_reg_read_addr = 0; uut->ocd_reg_write_addr = 0; uut->ocd_reg_write_data = 0; uut->ocd_read_enable = 0; uut->ocd_write_enable = 0; uut->ocd_rw_addr = 0; uut->ocd_write_word = 0; uut->start = 0; uut->start_address = start_address; t.reset(); ``` 2. Call upon toolchain to extract code/data from the .elf file for the test case ```cpp= std::cout << "=============> load elf file..." << "\n"; load_elf_sections(&t, uut); t.run(); ``` 3. Start the CPU, run for 2000 clock cycles ```cpp= std::cout << "\n=============> start running ..." << "\n"; uut->start = 1; t.run(); uut->start = 0; t.run(); ``` 4. Reset the CPU, put it into hold state for the second time ```cpp= t.reset(); ``` 5. Read the data out of the memory, and compare them against the reference signature ```cpp= if (sig_list.size()) { std::cout << "\n========> Matching signature ...\n\n"; if (uut_memory_peek (&t, uut, begin_signature_addr, sig_list.size() * 4, true)) { std::cout << "\n======> Signature MISMATCH!!!\n"; ret = -1; } else { std::cout << "\n======> Signature ALL MATCH!!!\n"; } } ``` ## How the simulation does for bootstrapping We don't need a ROM or flash to store bootcode, instead we use Host PC to transmit data to Debug Module. OCD will put data into CPU's local memory and make CPU start. After program is over. OCD will have a signal to TX mux module which can let CPU's data can transmit to host PC by TX_CPU . We can see the result from the PC in the end. ![](https://i.imgur.com/tBMkPSK.png)