# Caravel SoC Summerization [Reference of SoC](https://github.com/bol-edu/caravel-soc_fpga/tree/main) ## File .c .h .S .ld/lds ### Hardware Design (.v): * Role: Describes the hardware, including the SoC's structure and peripherals. * Relationship: These files define the SoC's architecture and hardware components, which will later interface with the software components (such as the processor, UART, GPIO). * Flow: These files are synthesized into logic gates and layout designs, and they determine how hardware behaves during execution. ### Assembly Code (.S, .s): * Role: Contains low-level boot code that initializes the system, setting up the processor, memory, and peripherals. * Relationship: These files interact directly with the hardware. The `.S` files may include macros or higher-level assembly for initializing the SoC's processor (e.g., setting up the stack pointer, jumping to the main program). * Flow: These files are the first to execute upon power-up, before jumping to C code for further system setup. **`.S` file**: This is an assembly file that contains macros and preprocessor instructions. During the compilation process, this file is first processed by the preprocessor (for example, expanding macros) before it is converted into assembly code. **`.s` file**: This is an assembly file that has already been processed by the preprocessor. In other words, it's pure assembly language with no macros or preprocessor instructions, ready to be passed directly to the assembler. ### C Source Code (.c): * Role: Contains higher-level software that controls peripherals, implements drivers, and runs application logic. * Relationship: The C code interacts with the hardware through register mappings and peripheral drivers. It works with the system initialization performed by assembly code (e.g., `.S`), and it depends on the hardware layout defined in `.v`. * Flow: The C code is compiled and linked into an executable, which will run on the processor once the boot process completes. ### Linker Script (.ld, .lds): * Role: Defines memory mapping for various code sections, including where the text (code), data, and stack will reside in memory. * Relationship: The linker script is used during the build process to organize the program's memory layout. It ensures that the C code and assembly code are placed in the correct locations in memory. * Flow: The linker script is used by the linker to generate the final executable, dictating where different sections (e.g., .text, .data) will be located in memory. * `.ld` and` .lds` are essentially the same, both are GNU linker scripts. ### Makefile (.mak): * Role: Manages the build process by specifying how to compile the code, which files to compile, and how to link everything together. * Relationship: The Makefile ties everything together, specifying the dependencies between .c, .S, .v, and linker scripts. It ensures that the proper compilation and linking steps are taken. * Flow: The Makefile executes the build process, compiling source files and linking them into a final executable. ### Flowchart ![lab4en_flowchart](https://hackmd.io/_uploads/ryuzRTg0kl.png) ## Analysis Flow (ex. counter_la) ### run_xsim #path : testbench/counter_la/run_xsim ``` rm -f counter_la.hex rm -rf xsim.dir/ *.log *.pb *.jou *.wdb riscv32-unknown-elf-gcc -Wl,--no-warn-rwx-segments -g \ -I../../firmware \ -march=rv32i -mabi=ilp32 -D__vexriscv__ \ -Wl,-Bstatic,-T,../../firmware/sections.lds,--strip-discarded \ -ffreestanding -nostdlib -o counter_la.elf ../../firmware/crt0_vex.S ../../firmware/isr.c counter_la.c riscv32-unknown-elf-objcopy -O verilog counter_la.elf counter_la.hex # to fix flash base address sed -ie 's/@10/@00/g' counter_la.hex rm -f counter_la.elf counter_la.hexe xvlog -d FUNCTIONAL -d SIM -d DUNIT_DELAY=#1 -d USE_POWER_PINS -f ./include.rtl.list.xsim counter_la_tb.v xelab -top counter_la_tb -snapshot counter_la_tb_elab xsim counter_la_tb_elab -R ``` * `rm` means remove * This command uses the **RISC-V GCC cross-compiler** (riscv32-unknown-elf-gcc) to compile the C code for the RISC-V architecture. * `-I../../firmware`: Adds `../../firmware` directory to the list of include paths for header files. * `-T,../../firmware/sections.lds`: Specifies a custom linker script (`sections.lds`) to define memory sections for the link process. * `-ffreestanding`: Tells the compiler that the program is freestanding, meaning it is not running in a standard operating system environment. * `-nostdlib`: This option instructs the compiler not to link the standard library. * `-o counter_la.elf`: Specifies the output file name as `counter_la.elf`. * `../../firmware/crt0_vex.S ../../firmware/isr.c counter_la.c`: These are source files; `crt0_vex.S` for providing the startup code to initialize the system; `isr.c` for providing the interrupt service routines; `counter_la.c` contains the main application logic. ### include.rtl.list.xsim #path : testbench/counter_la/include.rtl.list.xsim ``` ## Headers ../../rtl/header/defines.v ../../rtl/header/user_defines.v ## User project ../../rtl/user/user_project_wrapper.v ../../rtl/user/user_proj_example.counter.v ## VIP ../../vip/tbuart.v ../../vip/bram.v ##../../vip/spiflash.v ## DFFRAM Behavioral Model ../../vip/RAM256.v ../../vip/RAM128.v ## Mgmt Core Wrapper ../../rtl/soc-efabless/mgmt_core.v ../../rtl/soc-efabless/mgmt_core_wrapper.v ../../rtl/soc-efabless/VexRiscv_MinDebugCache.v ## These blocks need to stay in RTL ../../rtl/soc/mprj_io.v ## These blocks only needed for RTL sims ../../rtl/soc-efabless/housekeeping_spi.v ../../rtl/soc/spiflash.v ../../rtl/soc/chip_io.v ../../rtl/soc/gpio_control_block.v ../../rtl/soc/gpio_defaults_block.v ../../rtl/soc/housekeeping.v ../../rtl/soc/caravel.v ``` ![include_flow](https://hackmd.io/_uploads/ryZWsub0ye.png) ### sections.lds #path : firmware/sections.lds ``` /* INCLUDE ../../generated/output_format.ld */ OUTPUT_FORMAT("elf32-littleriscv") ENTRY(_start) __DYNAMIC = 0; /* INCLUDE ../../generated/regions.ld */ MEMORY { vexriscv_debug : ORIGIN = 0xf00f0000, LENGTH = 0x00000100 dff : ORIGIN = 0x00000000, LENGTH = 0x00000400 dff2 : ORIGIN = 0x00000400, LENGTH = 0x00000200 flash : ORIGIN = 0x10000000, LENGTH = 0x01000000 mprj : ORIGIN = 0x30000000, LENGTH = 0x00100000 hk : ORIGIN = 0x26000000, LENGTH = 0x00100000 csr : ORIGIN = 0xf0000000, LENGTH = 0x00010000 } SECTIONS { .text : { _ftext = .; /* Make sure crt0 files come first, and they, and the isr */ /* don't get disposed of by greedy optimisation */ *crt0*(.text) KEEP(*crt0*(.text)) KEEP(*(.text.isr)) *(.text .stub .text.* .gnu.linkonce.t.*) _etext = .; } > flash .rodata : { . = ALIGN(8); _frodata = .; *(.rodata .rodata.* .gnu.linkonce.r.*) *(.rodata1) . = ALIGN(8); _erodata = .; } > flash .data : { . = ALIGN(8); _fdata = .; *(.data .data.* .gnu.linkonce.d.*) *(.data1) _gp = ALIGN(16); *(.sdata .sdata.* .gnu.linkonce.s.*) . = ALIGN(8); _edata = .; } > dff AT > flash .bss : { . = ALIGN(8); _fbss = .; *(.dynsbss) *(.sbss .sbss.* .gnu.linkonce.sb.*) *(.scommon) *(.dynbss) *(.bss .bss.* .gnu.linkonce.b.*) *(COMMON) . = ALIGN(8); _ebss = .; _end = .; } > dff } PROVIDE(_fstack = ORIGIN(dff2) + LENGTH(dff2)); PROVIDE(_fdata_rom = LOADADDR(.data)); PROVIDE(_edata_rom = LOADADDR(.data) + SIZEOF(.data)); ``` :::info In the `memory` part, names can be edit, but you have to be sure that there is a phiscal address existed in the SoC ::: * `.text` section: which typically holds the program's executable code. * `> flash` means this section will be loaded into the flash memory region, as defined earlier: `flash : ORIGIN = 0x10000000, LENGTH = 0x01000000` * `*crt0*(.text)` means "Include any .text section content from any file whose name matches `crt0*`." * `KEEP(*crt0*(.text))`: This forces the linker to keep the matched sections, even if they appear unused. Normally, unused sections might be optimized away by the linker. * `*(.text .stub .text.* .gnu.linkonce.t.*)`-> Collect all executable code, including: Normal functions (`.text`); Startup helpers or trampolines (`.stub`); Specialized sub-sections of code (`.text.*`) ex. `.text.main`; Deduplicated functions (e.g., C++ templates) from GNU's link-once mechanism (`.gnu.linkonce.t.*`) * `. = ALIGN(8)`: aligns the current memory address to the next 8-byte boundary. For example, if current `counter` is at `0x1003`, it will jump to `0x1008` * `_fstack = 0x00000600` * `dff AT > flash` means that the `.data` section's content will physically reside in the `dff` region during runtime, but it will be loaded from the `flash` region at boot time. Essentially, the section's data will be stored in `flash` (external storage), and when the program starts, it will be copied to `dff` (internal memory) for execution. * `dff AT > flash` cause the address of `_fdata_rom` and `_edata_rom` is determined by Linker, so the real result should check dump file(compiled asembly code) by adding this code to `run_xsim`: `riscv32-unknown-elf-objdump -D counter_la.elf > dump.out` :::info `PROVIDE(_fdata_rom = LOADADDR(.data));`, `_fdata_rom` is the address of `.data` in `flash`. ::: :::info Functions or executable code → `.text` section Constant values (e.g., const variables) → `.rodata` section Initialized global or static variables → `.data` section Uninitialized global or static variables → `.bss` section ::: ### crt0_vex.S #path : firmware/crt0_vex.S ``` .global main .global isr .global _start _start: j crt_init nop nop nop nop nop nop nop .global trap_entry trap_entry: sw x1, - 1*4(sp) sw x5, - 2*4(sp) sw x6, - 3*4(sp) sw x7, - 4*4(sp) sw x10, - 5*4(sp) sw x11, - 6*4(sp) sw x12, - 7*4(sp) sw x13, - 8*4(sp) sw x14, - 9*4(sp) sw x15, -10*4(sp) sw x16, -11*4(sp) sw x17, -12*4(sp) sw x28, -13*4(sp) sw x29, -14*4(sp) sw x30, -15*4(sp) sw x31, -16*4(sp) addi sp,sp,-16*4 call isr lw x1 , 15*4(sp) lw x5, 14*4(sp) lw x6, 13*4(sp) lw x7, 12*4(sp) lw x10, 11*4(sp) lw x11, 10*4(sp) lw x12, 9*4(sp) lw x13, 8*4(sp) lw x14, 7*4(sp) lw x15, 6*4(sp) lw x16, 5*4(sp) lw x17, 4*4(sp) lw x28, 3*4(sp) lw x29, 2*4(sp) lw x30, 1*4(sp) lw x31, 0*4(sp) addi sp,sp,16*4 mret .text crt_init: la sp, _fstack la a0, trap_entry csrw mtvec, a0 data_init: la a0, _fdata la a1, _edata la a2, _fdata_rom data_loop: beq a0,a1,data_done lw a3,0(a2) sw a3,0(a0) add a0,a0,4 add a2,a2,4 j data_loop data_done: bss_init: la a0, _fbss la a1, _ebss bss_loop: beq a0,a1,bss_done sw zero,0(a0) add a0,a0,4 #ifndef SIM j bss_loop #endif bss_done: li a0, 0x880 //880 enable timer + external interrupt sources (until mstatus.MIE is set, they will never trigger an interrupt) csrw mie,a0 call main infinit_loop: j infinit_loop ``` 1. jump to `crt_init` => sp stores `_fstack`(`dff2: 0x00000600`) 2. if interupt go to `trap_entry` 3. load data from `_fdata_rom` to `_fdata ~ _edata`(`flash: 0x10000000 `) until `_fdata = _edata` 4. go to `bss_init:` to do global variable initialize (0) 5. `call main` instruction go to `counter_la.c` ### counter_la.c #path : testbench/counter_la/counter_la.c ```clike // Set UART clock to 64 kbaud (enable before I/O configuration) // reg_uart_clkdiv = 625; reg_uart_enable = 1; // Now, apply the configuration reg_mprj_xfer = 1; while (reg_mprj_xfer == 1); // Configure LA probes [31:0], [127:64] as inputs to the cpu // Configure LA probes [63:32] as outputs from the cpu reg_la0_oenb = reg_la0_iena = 0x00000000; // [31:0] reg_la1_oenb = reg_la1_iena = 0xFFFFFFFF; // [63:32] reg_la2_oenb = reg_la2_iena = 0x00000000; // [95:64] reg_la3_oenb = reg_la3_iena = 0x00000000; // [127:96] // Flag start of the test reg_mprj_datal = 0xAB400000; // Set Counter value to zero through LA probes [63:32] reg_la1_data = 0x00000000; // Configure LA probes from [63:32] as inputs to disable counter write reg_la1_oenb = reg_la1_iena = 0x00000000; while (1) { if (reg_la0_data_in > 0x1F4) { reg_mprj_datal = 0xAB410000; break; } } //print("\n"); //print("Monitor: Test 1 Passed\n\n"); // Makes simulation very long! reg_mprj_datal = 0xAB510000; ``` * I skip upper part of c code which is setting "The upper GPIO pins are configured to be output and accessble to the management SoC." and "The lower GPIO pins are configured to be output and accessible to the user project." * `reg_mprj_io[31:16] = GPIO_MODE_MGMT_STD_OUTPUT` * `reg_mprj_io[15:7 & 5:0] = GPIO_MODE_USER_STD_OUTPUT` * `reg_mprj_io[6] = UART Tx line` * `#define reg_mprj_datal (*(volatile uint32_t*)0x2600000c)` * The `b` in `reg_la0_oenb` stands for "bar" or "negated", which means the signal is **active low**. When a bit in `oenb` is 0, the corresponding LA line is enabled for output (i.e., user_project drives it). When the bit is 1, the output is disabled (tri-stated). * Similarly, the `iena` register is **active high**, meaning when a bit is 1, the corresponding LA line is enabled for input (i.e., user_project reads from it). ### counter_la_tb.v #path : testbench/counter_la/counter_la_tb.v ```verilog module counter_la_tb; reg clock; reg RSTB; reg CSB; reg power1, power2; wire gpio; wire uart_tx; wire [37:0] mprj_io; wire [15:0] checkbits; assign checkbits = mprj_io[31:16]; assign uart_tx = mprj_io[6]; always #12.5 clock <= (clock === 1'b0); initial begin clock = 0; end ``` * `mprj_io[31:16] = reg_mprj_datal[upper half bit]` but cant find the link between .c and .v ```verilog initial begin wait(checkbits == 16'hAB40); $display("LA Test 1 started"); wait(checkbits == 16'hAB41); wait(checkbits == 16'hAB51); $display("LA Test 2 passed"); #10000; $finish; end ``` * `checkbits` related to `counter_la.c` -> `reg_mprj_datal` ```verilog caravel uut ( .clock (clock), .gpio (gpio), .mprj_io (mprj_io), .flash_csb(flash_csb), .flash_clk(flash_clk), .flash_io0(flash_io0), .flash_io1(flash_io1), .resetb (RSTB) ); spiflash spiflash ( .ap_clk(clock), .ap_rst(RSTB), .romcode_Addr_A(romcode_Addr_A), .romcode_EN_A(romcode_EN_A), .romcode_WEN_A(romcode_WEN_A), .romcode_Din_A(romcode_Din_A), .romcode_Dout_A(romcode_Dout_A), .romcode_Clk_A(romcode_Clk_A), .romcode_Rst_A(romcode_Rst_A), .csb(flash_csb), .spiclk(flash_clk), .io0(flash_io0), .io1(flash_io1) ); ``` * see the interation between CPU with spiflash ```verilog bram #( .FILENAME("counter_la.hex") ) bram ( .CLK(clock), .WE0(romcode_WEN_A), .EN0(romcode_EN_A), .Di0(romcode_Din_A), .Do0(romcode_Dout_A), .A0(romcode_Addr_A) ); ``` :::info Very important: This is through the spiflash module in the Caravel/Verilog simulation platform, which emulates an external SPI Flash containing your compiled program file (usually a .hex or .bin). Then, the SoC—typically with an internal bootloader or ROM—loads the instructions from the Flash into RAM. ::: ### spiflash.v #path : rtl/soc/spiflash.v ```verilog assign io1 = outbuf[7]; assign romcode_Addr_A = {8'b0, spi_addr}; assign romcode_Din_A = 32'b0; assign romcode_EN_A = (bytecount >= 4); assign romcode_WEN_A = 4'b0; assign romcode_Clk_A = ap_clk; assign romcode_Rst_A = ap_rst; ``` * indicated that BRAM only do read operation :::warning here BRAM is inside the spiflash in fpga ::: ```verilog wire [7:0] memory; assign memory = (spi_addr[1:0] == 2'b00) ? romcode_Dout_A[7:0] : (spi_addr[1:0] == 2'b01) ? romcode_Dout_A[15:8] : (spi_addr[1:0] == 2'b10) ? romcode_Dout_A[23:16] : romcode_Dout_A[31:24] ; ``` * use `spi_addr[1:0]` to determine which part of `romcode_Dout_A` is written into spiflash memory. | `spi_addr[1:0]` | Choose Byte | `romcode_Dout_A` | |:---------------:|:-----------:| ------------------------- | | `2'b00` | Byte0 | `romcode_Dout_A[7:0]` | | `2'b01` | Byte1 | `romcode_Dout_A[15:8]` | | `2'b10` | Byte2 | `romcode_Dout_A[23:16]` | | `2'b11` | Byte3 | ` romcode_Dout_A[31:24] ` | ```verilog always @(negedge spiclk or posedge csb) begin if(csb) begin outbuf <= 0; end else begin outbuf <= {outbuf[6:0],1'b0}; if(bitcount == 0 && bytecount >= 4) begin outbuf <= memory; end end end ``` * if `csb = 1` means that spiflash is closed, and `outbuf` should be cleared. * at `negedge of spiclk` output buffer `outbuf` is shift left to assert new bit to `io1` ```verilog wire [7:0] buffer_next = {buffer[6:0], io0}; always @(posedge spiclk or posedge csb) begin // csb deassert -> reset internal states if (csb) begin buffer <= 0; bitcount <= 0; bytecount <= 0; end else begin // csb active -> count bit, byte buffer <= buffer_next; bitcount <= bitcount + 1; if (bitcount == 7) begin bitcount <= 0; bytecount <= bytecount + 1; // spi_action; if(bytecount == 0) spi_cmd <= buffer_next; // command if(bytecount == 1) spi_addr[23:16] <= buffer_next; if(bytecount == 2) spi_addr[15:8] <= buffer_next; if(bytecount == 3) spi_addr[7:0] <= buffer_next; if(bytecount >= 4 && spi_cmd == 'h03) begin // buffer <= memory; spi_addr <= spi_addr + 1; end end end end ``` * The first byte input `buffer` is the command, and for `bytecount == 1~3` is the address to fetch data from BRAM. * Since `bytecount` would not repeat the value from 0~4, **the first four if statement** is to determine the first address of the cmd in BRAM, after that address is determined by `spi_addr <= spi_addr + 1;` | Name | direction | dicription | clock edge | |:------:|:---------:|:------------------------------------------:|:----------------------------:| | buffer | input | Used to receive bits sent from the master | Received on `posedge spiclk` | | outbuf | output | Used to send data bit-by-bit to the master | Sent on `negedge spiclk` | * `io0` is the bit sent from master. And `io0` is meaningful only at the first input of (cmd + address) * `spi_cmd == 'h03` is the read cmd. ```verilog if(bitcount == 0 && bytecount >= 4) begin outbuf <= memory; end ``` * The if statemenmt above is required because of `assign romcode_EN_A = (bytecount >= 4);`, so output buffer starts to get 8 bit data from BRAM after the first address is fetched. And `bitcount == 0` assures that `outbuf` update when `memory` update next byte of data. ### user_proj_example.counter.v #path : rtl/user/user_proj_example.counter.v #### module counter ```verilog module counter #( parameter BITS = 32 )( input clk, input reset, input valid, input [3:0] wstrb, input [BITS-1:0] wdata, input [BITS-1:0] la_write, input [BITS-1:0] la_input, output reg ready, output reg [BITS-1:0] rdata, output reg [BITS-1:0] count ); //reg ready; //reg [BITS-1:0] count; //reg [BITS-1:0] rdata; always @(posedge clk) begin if (reset) begin count <= 0; ready <= 0; end else begin ready <= 1'b0; if (~|la_write) begin count <= count + 1; end if (valid && !ready) begin ready <= 1'b1; rdata <= count; if (wstrb[0]) count[7:0] <= wdata[7:0]; if (wstrb[1]) count[15:8] <= wdata[15:8]; if (wstrb[2]) count[23:16] <= wdata[23:16]; if (wstrb[3]) count[31:24] <= wdata[31:24]; end else if (|la_write) begin count <= la_write & la_input; end end end endmodule ``` * `|la_write` means that all the bit in `la_write` performs OR operation. Which `~|la_write` equals to `la_write == 0`, and `|la_write` equals to `la_write != 0` * `la_write` is the control signal: `count <= la_write & la_input;` if certain bits of `la_write` is 1 meaning that la_input of these certain bits will be write into `count` * It is important that **`valid`, `wstrb`, and `wdata` are from WishBone**. While, **`la_write`, `la_input` are from Logic Analyzer (CPU)**. * To summarize, if LA isn't writing, `count` exhibit as counter. While if WishBone(can imagine as Axi-lite) shakehand it will send `count` value to testbench and update(if we) count at the next period. #### module user_proj_example ```verilog assign valid = wbs_cyc_i && wbs_stb_i; assign wstrb = wbs_sel_i & {4{wbs_we_i}}; assign wbs_dat_o = rdata; assign wdata = wbs_dat_i; // IO assign io_out = count; assign io_oeb = {(`MPRJ_IO_PADS-1){rst}}; // IRQ assign irq = 3'b000; // Unused // LA assign la_data_out = {{(127-BITS){1'b0}}, count}; // Assuming LA probes [63:32] are for controlling the count register assign la_write = ~la_oenb[63:32] & ~{BITS{valid}}; // Assuming LA probes [65:64] are for controlling the count clk & reset assign clk = (~la_oenb[64]) ? la_data_in[64]: wb_clk_i; assign rst = (~la_oenb[65]) ? la_data_in[65]: wb_rst_i; ``` * `la_write` is available when `la_oenb[63:32] == 0` and `valid == 0`, means that it is a write operation for LA and WishBone does not handshake. * `clk` and `rst` is determined by ether LA or WB.