# Reindeer lab3 contributed by < `kksweet8845` > ## Source code taken from [assignment 1](https://hackmd.io/@kksweet8845/lab1r32i) ```cpp # This example shows an implementation of the # mathematical combintation formula. .data argument: .word 7 5 .text main: # Initialize the register of argument addi a2, zero, 0x7 # The n of nCk addi a3, zero, 0x5 # The k of nCk addi a0, zero, 0 # The return value of nCk jal comb # Begin the routine of comb j end # After finished, goto end: comb: addi sp, sp, -12 # Create stack sw ra, 8(sp) # Save return address sw a2, 4(sp) # Save caller argument $a2(The n of nCk) sw a3, 0(sp) # Save caller argument $a3(The k of nCk) beq a2, a3, if # Jump if nCn == 1 beq a3, zero, if # Jump if nC0 == 1 addi a2, a2, -1 # Sub 1 from $a2 addi a3, a3, -1 # Sub 1 from $a3 jal comb # The recursion of (n-1)C(k-1) lw a3, 0(sp) # Restore $a3 to $a3 + 1 jal comb # The recursion of (n-1)C(k) lw ra, 8(sp) # Restore $ra, $a2, $a3 lw a2, 4(sp) # '' lw a3, 0(sp) # '' addi sp, sp, 12 # Pop the stack jr ra # Return to ${ra} if: addi sp, sp, 12 # Pop the stack addi a0, a0, 1 # Add 1 to ${a0} jr ra # Return to ${ra} end: ``` It is a assembly for the simple combination function. It will produce the value of nCk by using the Pascal combination formula \begin{split}C^{n}_{m} = C^{n-1}_{m-1} + C^{n-1}_{m}\end{split} , which can be used in recusive method to produce the answer. However, this method is slow when the number is too large. ## Transform assembly into ELF file ### From assembly to ELF According to the [README.md](https://github.com/riscv/riscv-compliance/blob/master/README.md) in [riscv-compliance](https://github.com/riscv/riscv-compliance), I need to compile the .S file into .elf one. Quoted from the [README.md](https://github.com/riscv/riscv-compliance/blob/master/README.md) >The only setup required is to define where the toolchain is found, and where the target / device is found. For the toolchain, the binaries must be in the search path and the compiler prefix is defined on the make line. The default value for this is ``` RISCV_PREFIX ?= riscv64-unknown-elf- ``` >The path to the RUN_TARGET is defined within the riscv-target Makefile.include. To run the rv32i test suite on riscvOVPsim ``` make RISCV_TARGET=riscvOVPsim RISCV_DEVICE=rv32i ``` I need to specify the `RISCV_PREFIX`, `RISCV_TARGET`, `RISCV_DEVICE`. Then, I need to set up the variable to match my riscv-compiler, that is, - `riscv-none-embed` is the new adopted name of either `riscv64-unknown-elf` or `riscv32-unknown-elf`. - `riscvOVPsim` is the simulator in this example - `rv32i` is the support ISA in riscv with only integer operations with simple arithmetic. The following lines of code is written in the `Makefile` in `/riscv-test-suite/rv32i/Makefile` ``` RISCV_PREFIX := riscv-none-embed RISCV_TARGET := riscvOVPsim RISCV_DEVICE := rv32i ``` However, in `Makefile` of rv32i, there is another variables needed to be modified are `TARGETDIR` and `ROOTDIR`, which refer to the `Makefile.include` in `/riscv-target/riscvOVPsim/device/rv32i/` and the root path which you want to mkdir /work directory the place for elf output and some other objdump and signature file. That it, I need to specify the absolute path to this directory. ``` TARGETDIR:=/home/nober/git/adv_CO/riscv-compliance/riscv-target ROOTDIR := /home/nober/git/adv_CO/riscv-compliance ``` :::danger Avoid hardcoded paths such as `home/nober`. Instead, you can set via environment variables. :notes: jserv ::: Until now, you still can't compile the .S file into .elf file, because it will occur the error of not found the simulator, `riscvOVPsim`. Then, I finally modified the path of `riscvOVPsim` in Makefile.include. ``` TARGET_SIM ?= /home/nober/git/adv_CO/riscv-compliance/riscv-ovpsim/bin/Linux64/riscvOVPsim.exe ``` Type the command shown in the following will generate the elf file in /work directory. ``` make I-ADD-01.elf ``` The hierarchy of /work ```shell work ├── I-ADD-01.elf ├── I-ADD-01.elf.objdump └── rv32i └── I-ENDIANESS-01.elf.objdump ``` ### Generate the ground truth to this test file In order to test the `Reindeer`, we need to use another simulator to generate the true answer, riscvOVPsim in this example. ``` %.log: %.elf $(RUN_TARGET) ``` Just typically type the `make I-ADD-01.log` will generate the `log` file and `signature` file. The following file is `log` and `signature`, respectively. ```log Info Session started: Mon Nov 11 14:09:54 2019 Info ------------- ENVIRONMENT ------------- Info -------------------------------------- Info -------- FLAGS (from /home/nober/git/adv_CO/riscv-compliance/riscv-ovpsim/bin/Linux64/riscvOVPsim.exe) Info --variant RV32I Info --program /home/nober/git/adv_CO/riscv-compliance/work//I-ADD-01.elf Info --signaturedump Info --customcontrol Info --override riscvOVPsim/cpu/sigdump/SignatureFile=/home/nober/git/adv_CO/riscv-compliance/work//I-ADD-01.signature.output Info --override riscvOVPsim/cpu/sigdump/ResultReg=3 Info --override riscvOVPsim/cpu/simulateexceptions=T Info --override riscvOVPsim/cpu/defaultsemihost=F Info --logfile /home/nober/git/adv_CO/riscv-compliance/work//I-ADD-01.log Info --override riscvOVPsim/cpu/user_version=2.3 Info --override riscvOVPsim/cpu/priv_version=1.11 Info -------------------------------------- Imperas riscvOVPsim CpuManagerFixedPlatform (64-Bit) v20190923.0 Open Virtual Platform simulator from www.IMPERAS.com. Copyright (c) 2005-2019 Imperas Software Ltd. Contains Imperas Proprietary Information. Licensed Software, All Rights Reserved. Visit www.IMPERAS.com for multicore debug, verification and analysis solutions. CpuManagerFixedPlatform started: Mon Nov 11 14:09:54 2019 Info (OR_OF) Target 'riscvOVPsim/cpu' has object file read from '/home/nober/git/adv_CO/riscv-compliance/work//I-ADD-01.elf' Info (OR_PH) Program Headers: Info (OR_PH) Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align Info (OR_PD) LOAD 0x00001000 0x80000000 0x80000000 0x000003c4 0x000003c4 R-E 1000 Info (OR_PD) LOAD 0x00002000 0x80001000 0x80001000 0x00001204 0x00001204 RW- 1000 Info (SIGNATURE_DUMP) Found Symbol 'begin_signature' in application at 0x80002000 Info (SIGNATURE_DUMP) Found Symbol 'end_signature' in application at 0x80002090 Info (SIGNATURE_DUMP) Signature File enabled, file '/home/nober/git/adv_CO/riscv-compliance/work//I-ADD-01.signature.output'. Info (SIGNATURE_DUMP) Extracting signature from 0x80002000 size 144 bytes Info (SIGNATURE_DUMP) Symbol 'begin_signature' at 0x80002000 Info (SIGNATURE_DUMP) Symbol 'end_signature' at 0x80002090 Info (SIGNATURE_DUMP) Intercept 'write_tohost'. Generate Signature file fffff5cbfffffffffffff80200000000 800000000765432000001a3480000000 8000000000001a340765432080000000 00000000fffff802fffffffefffff5cb fffff5cbfffffffffffff802ffffffff 800000000765432000001a3480000000 8000000000001a340765432080000000 00000000fffff802fffffffefffff5cb 00000000ffffffffffffffffffffffff Test PASSED Info Info --------------------------------------------------- Info CPU 'riscvOVPsim/cpu' STATISTICS Info Type : riscv (RV32I) Info Nominal MIPS : 100 Info Final program counter : 0x80000044 Info Simulated instructions: 212 Info Simulated MIPS : run too short for meaningful result Info --------------------------------------------------- Info Info --------------------------------------------------- Info SIMULATION TIME STATISTICS Info Simulated time : 0.00 seconds Info User time : 0.00 seconds Info System time : 0.00 seconds Info Elapsed time : 0.00 seconds Info --------------------------------------------------- CpuManagerFixedPlatform finished: Mon Nov 11 14:09:54 2019 CpuManagerFixedPlatform (64-Bit) v20190923.0 Open Virtual Platform simulator from www.IMPERAS.com. Visit www.IMPERAS.com for multicore debug, verification and analysis solutions. Info Session ended: Mon Nov 11 14:09:54 2019 ``` ```shell 00000000 fffff802 00001a34 ... 00001a34 80000000 ffffffff 00000000 ``` ## Rewrite as part of test suite Before I choose this assembly code to be my test file, I had chosen the 9x9 multiplication table to be my test one. However, the feature of the test file is not quite high level as `assert` in C/C++. It is hard for me to dynamically change the address of the array, so I give up this option. The first thing in here is to note that no any unnecessary section like `.text`. It will produce the file with wrong address. ```cpp #include "riscv_test_macros.h" #include "compliance_test.h" #include "compliance_io.h" RV_COMPLIANCE_RV32M RV_COMPLIANCE_CODE_BEGIN RVTEST_IO_INIT RVTEST_IO_ASSERT_GPR_EQ(x31, x0, 0x00000000) RVTEST_IO_WRITE_STR(x31, "Test begin Reserved regs ra(x1) a0(x10) t0(x5)\n") # ---------------------------------------------------------------------------------- initial1: la x5, test_1_res li x18, 0x7 li x19, 0x5 li x20, 0 main1: sw x18, 0(x5) sw x19, 4(x5) jal comb sw x20, 8(x5) RVTEST_IO_CHECK() RVTEST_IO_ASSERT_GPR_EQ(x5, x18, 0x7) RVTEST_IO_ASSERT_GPR_EQ(x5, x19, 0x5) RVTEST_IO_WRITE_STR(x31, "# Argument test - complete\n") RVTEST_IO_ASSERT_GPR_EQ(x5, x20, 0x00000015) RVTEST_IO_WRITE_STR(x31, "# Combination test - complete") #----------------------------------------------------------------------------------------- initial2: la x5, test_2_res li x18, 0x3 li x19, 0x2 li x20, 0 main2: sw x18, 0(x5) sw x19, 4(x5) jal comb sw x20, 8(x5) j end RVTEST_IO_CHECK() RVTEST_IO_ASSERT_GPR_EQ(x5, x18, 0x3) RVTEST_IO_ASSERT_GPR_EQ(x5, x19, 0x2) RVTEST_IO_WRITE_STR(x31, "# Argument test - complete\n") RVTEST_IO_ASSERT_GPR_EQ(x5, x20, 0x00000003) RVTEST_IO_WRITE_STR(x31, "# Combination test - complete") comb: addi x2, x2, -12 sw x1, 8(x2) sw x18, 4(x2) sw x19, 0(x2) beq x18, x19, if beq x19, x0, if addi x18, x18, -1 addi x19, x19, -1 jal comb lw x19, 0(x2) jal comb lw x1, 8(x2) lw x18, 4(x2) lw x19, 0(x2) addi x2, x2, 12 jr x1 if: addi x2, x2, 12 addi x20, x20, 1 jr x1 end: RV_COMPLIANCE_HALT RV_COMPLIANCE_CODE_END # Inpute data section. .data # Output data section. RV_COMPLIANCE_DATA_BEGIN .align 4 test_1_res: .fill 3, 4, -1 test_2_res: .fill 3, 4, -1 RV_COMPLIANCE_DATA_END ``` I used the `x18, x19 and x20` to be my arg1 , arg2 and return value of the combination function. And, we can simply compile this file into elf format. Before compile this file, it is needed to put the name into `rv32i_sc_tests` in `Makefrag`. ``` make comb.elf make comb.log ``` Then, it will generate the file into /work directory. ```shell work ├── comb.elf ├── comb.elf.objdump ├── comb.log ├── comb.signature.output ├── I-ADD-01.elf ├── I-ADD-01.elf.objdump ├── I-ADD-01.log ├── I-ADD-01.signature.output └── rv32i └── I-ENDIANESS-01.elf.objdump ``` Any, my test is very simple. Just the calculation of $C^{7}_{5}$ and $C^{3}_{2}$. The signature is as following ```shell 00000007 # The arg1 of case 1 00000005 # The arg2 of case 1 00000015 # The return value 00000003 # Case 2 ... 00000002 00000003 00000000 00000000 ``` ## Generate the vcd file After finishing the compilation of elf and log file, we can simply copy these files into /sim/compliance/ in [Reindeer]() and type `make test comb`. ```shell $ make test comb ``` It will gererate the `comb.vcd`. Use the [gtkwave]() to view the `comb.vcd` ```shell $ gtkwave comb.vcd ``` ![](https://i.imgur.com/YWOWtIV.png) ::: success // TODO : Analysis the comb.vcd Update at 11/21 ::: After I traced the core of reindeer, it using the dual port for reading and writting simultaneously. It uses two bram to implement the function above. ```shell dual_port_ram #(.ADDR_WIDTH (`REG_ADDR_BITS), .DATA_WIDTH (`XLEN)) single_clk_ram_rs1 ( .waddr (write_addr), /* Synchronize the memory */ .raddr (read_rs1_addr), .din (write_data_in), .write_en (write_enable), .clk (clk), .dout (read_rs1_data_out_i) ); dual_port_ram #(.ADDR_WIDTH (`REG_ADDR_BITS), .DATA_WIDTH (`XLEN)) single_clk_ram_rs2 ( .waddr (write_addr), /* Synchronize the memory */ .raddr (read_rs2_addr), .din (write_data_in), .write_en (write_enable), .clk (clk), .dout (read_rs2_data_out_i) ); ``` Let take a look to the waveform. Before that, I select `PC_in`, `PC_out` and some register which I used. In the comb.s, I use the `x18`, `x19`, `x20`. - `PC_in` - `PC_out` - `mem(18)` : `x18`, which is the m - `mem(19)` : `x19`, which is the n - `mem(20)` : `x20`, which is the return value And, the dissemble of elf file. ```shell Disassembly of section .text.init: 80000000 <_start>: 80000000: 04c0006f j 8000004c <reset_vector> 80000004 <trap_vector>: 80000004: 34202f73 csrr t5,mcause 80000008: 00800f93 li t6,8 8000000c: 03ff0a63 beq t5,t6,80000040 <write_tohost> 80000010: 00900f93 li t6,9 80000014: 03ff0663 beq t5,t6,80000040 <write_tohost> 80000018: 00b00f93 li t6,11 8000001c: 03ff0263 beq t5,t6,80000040 <write_tohost> 80000020: 80000f17 auipc t5,0x80000 80000024: fe0f0f13 addi t5,t5,-32 # 0 <_start-0x80000000> 80000028: 000f0463 beqz t5,80000030 <trap_vector+0x2c> 8000002c: 000f0067 jr t5 80000030: 34202f73 csrr t5,mcause 80000034: 000f5463 bgez t5,8000003c <handle_exception> 80000038: 0040006f j 8000003c <handle_exception> ... ``` We can see that the signal of `PC_in` and `PC_out`, which they are the program counter of PulseRain MCU. However, In MCU, it has several signal also called `PC_in` and `PC_out` of different stage, i.e. IF, ID, EXE, etc. In first line of code, it is the jump instruction which will jump to `0x8000004c` Let look at the wave diagram to justify this instruction. ![](https://i.imgur.com/WrQJrvX.png) We can see the exe stage, `branch_addr` is `0x8000004c`, which is identical to the comb.objdump. Now, let skip to the execution of the test code. ```shell 8000011c <main1>: 8000011c: 0122a023 sw s2,0(t0) 80000120: 0132a223 sw s3,4(t0) 80000124: 030000ef jal ra,80000154 <comb> 80000128: 0142a423 sw s4,8(t0) ``` The waveform is shown as following. First line `0x8000011c sw s2, 0(t0)`, rs1 = x5, rs2 = x18, offset = 0 As we can see, the rs1 is `x5`, rs2 is `x18`, and the immediate offset is zero. ![](https://i.imgur.com/plAwssO.png) In the first test, the m and n are stored in `x18`(7) and `x19`(5), respectively. That is, $C^{7}_{5} = 21$ ``` 80000154 <comb>: 80000154: ff410113 addi sp,sp,-12 80000158: 00112423 sw ra,8(sp) 8000015c: 01212223 sw s2,4(sp) 80000160: 01312023 sw s3,0(sp) 80000164: 03390863 beq s2,s3,80000194 <if> 80000168: 02098663 beqz s3,80000194 <if> 8000016c: fff90913 addi s2,s2,-1 80000170: fff98993 addi s3,s3,-1 80000174: fe1ff0ef jal ra,80000154 <comb> 80000178: 00012983 lw s3,0(sp) 8000017c: fd9ff0ef jal ra,80000154 <comb> 80000180: 00812083 lw ra,8(sp) 80000184: 00412903 lw s2,4(sp) 80000188: 00012983 lw s3,0(sp) 8000018c: 00c10113 addi sp,sp,12 80000190: 00008067 ret ``` According to my code. ``` 8000015c: 01212223 sw s2,4(sp) 80000160: 01312023 sw s3,0(sp) ``` For instruction at `15c`, the `rs2` must be `x18` and the `rs2_in` must be 7. for instruction at `160`, the `rs2` is must be `x19` and the `rs2_in` must be 5. - Instruction `15c` ![](https://i.imgur.com/tCZA3c9.png) - Instruction `160` ![](https://i.imgur.com/zlJyXgI.png) Our final answer is at `x20` return to `128`. ``` 8000011c <main1>: 8000011c: 0122a023 sw s2,0(t0) 80000120: 0132a223 sw s3,4(t0) 80000124: 030000ef jal ra,80000154 <comb> 80000128: 0142a423 sw s4,8(t0) ``` We can see the answer is 0x15, 21 in decimal. ![](https://i.imgur.com/arb7LJA.png) ## How Reindeer works with Verilator Verilator can convert synthesizable Verilog code, plus some Synthesis, SystemVerilog and small subset of Verilog AMS into C++ or System code. It is not a traditional simulator, but a compiler. Pardon... It is a compiler which can compile the Verilog code into C++ or SystemC code. It is fantistic tool for me to run the verilog without commericial application, like modelsim or Intel Quatus. The Makefile in `Reindeer/sim/verilator/Makefile` is also to build the UUT, VPluseTain_RV2T_MCU from a bunch of submodule in Reindeer/submodules/PulseRain_MCU and compile in a /obj_dir, this is full of submodule of written with C++. ### In `tb_PulseRain_RV2T.cpp` Test Bench, tb_PulseRain_RV2T.cpp will new a `UUT` object, which is the Reindeer simulator, VPulseRain_RV2T_MCU. `tb_PulseRain_RV2T.cpp` will extracte the elf file with wanted value. ```cpp ref_file_process(ref_file); /* Extract the begin_signature and end_signature */ elf_label_process(elf_file); prepare_elf_section_list(elf_file); ``` It reset UUT input wire with the following code. The `testbench`, which is a class to evalutate the simulator given UUT and tfp, `VerilatedVcdC*` ```cpp uut = new UUT; // Create instance /* Create a VCD dump file in c standalone (no SystemC) simulations */ VerilatedVcdC* tfp = new VerilatedVcdC; if (!trace_file.empty()) { Verilated::traceEverOn(true); uut->trace (tfp, 99); tfp->open (trace_file.c_str()); } else { tfp = NULL; } testbench t {10, uut, tfp}; uut->reset_n = 0; // Set some inputs uut->sync_reset = 0; std::cout << "\n=============> reset..." << "\n"; /* Initialize the uut */ uut->sync_reset = 0; uut->ocd_reg_we = 0; uut->ocd_reg_read_addr = 0; uut->ocd_reg_write_addr = 0; uut->ocd_reg_write_data = 0; uut->ocd_read_enable = 0; uut->ocd_write_enable = 0; uut->ocd_rw_addr = 0; uut->ocd_write_word = 0; uut->start = 0; uut->start_address = start_address; t.reset(); std::cout << "=============> init stack ..." << "\n"; uut->ocd_reg_we = 1; uut->ocd_reg_write_addr = 2; // SP uut->ocd_reg_write_data = DEFAULT_STACK_INIT_VALUE; t.run(); uut->ocd_reg_we = 0; t.run(); ``` ## Hold and Load The tradtional method to bootstrap a CPU is 1.making a boot loader in software 2. storing the boot loader in a ROM and 3. after power on rest, the boot loader is supposed to be executed first, for which it will move the rest of the code/data into RAM. And the PC will be set to the _start address of the new image afterwords. ![](https://i.imgur.com/Su4SnE9.png) Quote from [README.md] from [Reindeer]() >After reset, the soft CPU will be put into a hold state, and it will have access to the UART TX port by default. But a valid debug frame sending from the host PC can let OCD to reconfigure the mux and switch the UART TX to OCD side, for which the memory can be accessed, and the control frames can be exchanged. A new software image can be loaded into the memory during the CPU hold state, which gives rise to the name "hold-and-load".