FSIC (Full-Stack IC) Design / Architecture

# FSIC (Full-Stack IC) Design / Architecture References: [fsic_fpga --- from bol-edu](https://github.com/bol-edu/fsic_fpga) [TOC] ## FSIC Architecture - FSIC is an architecture to implement an IC validation system based on Caravel SoC. **Caravel SoC** ![螢幕擷取畫面 2024-01-23 233929](https://hackmd.io/_uploads/SJrT6UpY6.png =70%x) **Block Diagram of FSIC** ![螢幕擷取畫面 2024-01-23 234350](https://hackmd.io/_uploads/S17A0UatT.png) - We will embed application accerlerator in `USER_SUBSYS` of FSIC **Caravel-FSIC + FPGA-FSIC** ![螢幕擷取畫面 2024-01-23 234748](https://hackmd.io/_uploads/ByxpkD6KT.png) - We can know that FSIC is a bridge between Caravel SOC and FPGA, which can also help us doing the verification. ## FSIC-AXIS Interface Specification - Extension of AXI-Stream specification to include 1. AXI-Lite configuration transaction (AXI-Lite Overload) 2. Data payload for AXI-Stream transaction ![螢幕擷取畫面 2024-01-24 000604](https://hackmd.io/_uploads/S1O8NPTKT.png) 1. AXI-Lite configuration tranction - AXI-Lite configuration can be Upstream/Downstream - Remote host access downstream to AXI-Lite register in FSIC (user logic, mail box) via **IO_Serdes**. - Master in FSIC (mail box) access upstream to AXI-Lite register in remote host (mail box) via **IO_Serdes**. 2. Data payload for AXI-Stream transaction - Downstream Data payload is from AXI-S master in remote host to user logic (AXI-S slave) of FSIC - Upstream Data payload is from AXI-S master (user logic, logic analyzer) of FSIC output data to AXI-S slave in remote host. - Data flow directions of AXI-Stream: - Upstream: data flows from Caravel SOC to FPGA/Memory ![螢幕擷取畫面 2024-01-23 235646](https://hackmd.io/_uploads/Skh0bvTK6.png =70%x) - Downstream: data flows from FPGA/Memory to Caravel SOC ![螢幕擷取畫面 2024-01-23 235657](https://hackmd.io/_uploads/rJM1GD6t6.png =70%x) ## System Clocking Scheme ![螢幕擷取畫面 2024-01-24 004058](https://hackmd.io/_uploads/B1-V2P6Yp.png =80%x) - Clock Skew Control - The Caravel chip and FPGA runs **synchronously** both on `core_clk` and `io_clk`, i.e. the skew of the `core_clk` separately in Caravel chip and FPGA is controlled, so is the `io_clk`. - The skew is controlled by board layout, and matched IO buffer delay. Within Caravel SoC and FPGA, the skew between `core_clk` and `io_clk` is also controlled by balanced clock tree layout. - A detailed post-layout static timing analysis guarantee there is no hold time violation when signal is passed between `core_clk` and `io_clk`. A synchronizer using **negative** `io_clk` edge is a method to ensure sufficient/safe hold time. (See the part "Module Specification - IO_SERDES") ## Module Specification - All modules are wrapped inside User Project Wrapper - The User Project Wrapper includes 1. One or more user projects. There is only one active user project in a running system. 2. Integrator logic, include aggregator and disaggregator for Instrumentation tools, including 1. LogicAnalyzer 2. Tester 3. Protocol Converter 1. Config_Ctrl (WB_Axilite) 2. Axis_AxiLite 3. IO serialization logic - IO_Serdes 4. Communication between Caravel and FPGA 1. Mailbox 2. Axis_Switch ### Config_Ctrl (CC) ![CC_Block](https://hackmd.io/_uploads/H1kFnqRKp.png =120%x) - Generate AXI-Lite transaction to configure all modules in user project wrapper - Function: - Wishbone (WB) to AXI-Lite Conversion (WB-AXI) - Arbitrate configuration from Caravel and FPGA side - Target decoding, and generate `cc_target_enable` signal - Each module has the address range of 4K. (3000_?xxx) #### Address Map | Target Module | Address Range | Enable Signal | | ------------- | ------------- | ------------- | | User Projects | 32'h3000_0xxx | `cc_up_enable`| | Logic Analyzer| 32'h3000_1xxx | `cc_la_enable`| | Axis_Axilite | 32'h3000_2xxx | `cc_aa_enable`| | IO_Serdes | 32'h3000_3xxx | `cc_is_enable`| | AXis_Switch | 32'h3000_4xxx | `cc_as_enable`| | Config_Ctrl | 32'h3000_5xxx | | MailBox(MB): 2000~201F Axilite_Axis(AA):2100~2107 #### In `caravel.h` [header file of cravel](https://github.com/bol-edu/fsic_fpga/blob/main/firmware/caravel.h) ![螢幕擷取畫面 2024-01-26 021845](https://hackmd.io/_uploads/B1lB8Qe56.png) #### Configuration Register - CC - Configuration Control Group: 32'h3000_5000 ~ 32'h3000_5FFF ![螢幕擷取畫面 2024-01-24 223232](https://hackmd.io/_uploads/S1tqys0Yp.png =70%x) ### AXILite-AXIS (AA) - Protocol Conversion between AXI-Lite <-> AXI-Stream ![螢幕擷取畫面 2024-01-24 223821](https://hackmd.io/_uploads/r10kZoAtp.png =40%x) #### Features 1. Convert axilite transaction to modified axis transaction, and convert axis back to axilite. (**AXI-Lite <---> AXI-Stream**) 2. Syncing **mailbox** in Caravel and FPGA, can generate interrupt with status if mailbox is written by transaction from other side. 3. Maintain the access of modules in Caravel user_project_wrapper from FPGA/PS side. 4. In case of the mailbox base address in address map may change at FPGA side, this module implements **remap control register / remap address register** for software configuration. #### Functions ``` py= # addr is the address received by s_axilite, could be from CC or FPGA/PS if addr in range(0x3000_0000, 0x3000_3FFF): # mailbox address space if Caravel: if READ: # path 1 => read Caravel mailbox elif WRITE: # path 2 => write Caravel mailbox => write FPGA mailbox (raise interrupt in FPGA) elif FPGA: fpga_mb_addr = addr if REMAP: fpga_mb_addr += remap_base_addr if READ: # path 3 => read FPGA mailbox using `fpga_mb_addr` elif WRITE: # path 4 => write FPGA mailbox using `fpga_mb_addr` => write Caravel mailbox using `addr` (raise interrupt in Caravel) elif addr in range(0x3000_0000, 0x3FFF_FFFF): # Caravel address space if Caravel: => NOP (handled by CC) elif FPGA: if READ: # path 5 => read Caravel elif WRITE: # path 6 => write Caravel (no interrupt in Caravel) ``` #### Some Registers in AA | Register Name | Offset Address | Description | | ------------- | -------------- | ----------- | | `remap_enable`| 'h0 | enable remap at FPGA side, this module will send address + base address offset in remap addr register. 1'b0: (default) remap disabled. 1'b0: remap enabled | | `remap_base_addr` | 'h0 | base address for address map at FPGA side.- 32'h0 (default) | | `interrupt_enable` | 'h0 | enable interrupt if **write** transaction from other side send to mailbox. 1'b0: (default)disabled 1'b1: interrupt enabled | | `mb_wr` | 'h0 | mailbox is written by the transaction from other side. 1'b0: (default)no other side **write** transaction 1'b1: mailbox is written | #### Optimization - Due to the somewhat complex nature of this design and the fact that the architecture of the FSIC itself should not be larger than the designs on the FPGA side and Caravel SoC side, if our design here becomes too large, it may give a sense of overshadowing or dominating the main components. - So, we can simplify the design of AXILite-AXIS - [FSIC-AA-Optimization](/gncMvHJJT1GYZVH5j8NE-g) will be published when we finish. ### AXI-SWITCH (AS) - Data flow directions of AXI-Stream: - **Upstream**: data flows from Caravel SOC to FPGA/Memory ![螢幕擷取畫面 2024-01-23 235646](https://hackmd.io/_uploads/Skh0bvTK6.png =70%x) - **Downstream**: data flows from FPGA/Memory to Caravel SOC ![螢幕擷取畫面 2024-01-23 235657](https://hackmd.io/_uploads/rJM1GD6t6.png =70%x) - Data producers and consumers include - User Project (Caravel) - Extended User Project (FPGA) - Logic Analyzer (Caravel) - Axilite-Axis (Caravel, FPGA) - AxiDMA (FPGA) ![螢幕擷取畫面 2024-01-24 000604](https://hackmd.io/_uploads/S1O8NPTKT.png) #### Demultiplexer with TID ![螢幕擷取畫面 2024-01-25 104611](https://hackmd.io/_uploads/BkJ9iHkc6.png) #### Definition of `TID[1:0]` - `TID[1:0]` tells the source/destination of current streaming path(upstream/downstream) | Direction | `TID[1:0]` | Source Module | Destination Module | | ---------- | -- | ----------------------- | --------- | | Downstream | 00 | User DMA (M_AXIS_MM2S) in remote host (option extended user project) | User Project - the current active user project | | Downstream | 01 | Axilite Master R/W in remote host (include Mailbox **write**) | Axis-Axilite (include Mailbox) | | Upstream | 00 | User Project - the current active user project | User DMA (S_AXIS_S2MM) in remote host (option extended user project) | | Upstream | 01 | Axis-Axilite (for Mailbox) | Axilite slave in remote host (for mailbox **write**) | | Upstream | 10 | Logic Analyzer | Logic Analyzer data receiver - DMA (S_AXIS_S2MM) in remote host | #### Round-Robin Arbitration - MUX ![螢幕擷取畫面 2024-01-25 104505](https://hackmd.io/_uploads/B138jr15p.png) #### Introduction to Round-Robin Arbitration - Round Robin is an arbitration algorithm designed for fairness. The fundamental idea is that when a requestor obtains a grant, its priority in subsequent arbitration becomes the lowest. In other words, the priority of each requestor is not fixed but decreases to the lowest after receiving a grant, and it adjusts based on the permissions granted to other requestors. Therefore, when multiple requestors are present, grants are sequentially allocated to each requestor. Even if a higher-priority requestor submits a new request, it will wait until the preceding requestors receive grants before it is its turn. - Let's illustrate this with an example of four requestors. In the table below, **`Req[3:0]`** indicates the actual requests, where 1 represents a request. **RR Priority** shows the current priority, with ***0 being the highest and 3 being the lowest***. **`RR_Grant[3:0]`** displays the permissions granted based on the current Round Robin priority and requests. **`Fixed_Grant[3:0]`** represents the grant values if a fixed priority is followed, i.e., in the order of 3, 2, 1, 0. | | `Req[3:0]` | RR Priority | `RR_Grant[3:0]` | `Fixed_Grant` | | ----------- | ---- | ---- | ---- | ---- | | **Cycle 0** | 0101 | 3210 | 0001 | 0001 | | **Cycle 1** | 0101 | 2103 | 0100 | 0001 | | **Cycle 2** | 0011 | 0321 | 0001 | 0001 | | **Cycle 3** | 0010 | 2103 | 0010 | 0010 | | **Cycle 4** | 1000 | 1032 | 1000 | 1000 | - In the first cycle (cycle 0), in the initial state, we assume that `req[0]` has the highest priority, `req[1]` follows, and `req[3]` has the lowest priority(RR priority=3210). When both `req[2]` and `req[0]` are asserted (1) simultaneously, based on priority, `req[0]` takes precedence over `req[2]`, and the grant becomes 0001. - In the second cycle (cycle 1), since `req[2]` did not receive a grant in the previous cycle, it remains asserted (1). Meanwhile, `req[0]` issues a new request. This is where the difference between Round Robin and fixed priority becomes apparent. For fixed priority, the grant remains 0, i.e., 0001. However, in the Round Robin algorithm, as `req[0]` was granted in the previous cycle, its priority is now the lowest (3). Consequently, `req[1]`'s priority becomes the highest, and since `req[0]` is now at the lowest priority, the grant is allocated to `req[2]` instead. - Similarly, in the third cycle (cycle 2), as `req[2]` was granted in the previous cycle, its priority becomes the lowest (3), and `req[3]`'s priority becomes the highest. We can continue this analysis for the subsequent cycles. ### IO_SERDES (IS) #### Introduction to SERDES - SERDES stands for ***Serializer/Deserializer***. It is a technology used in the field of data communications to transmit and receive serialized data over a serial link. - In many electronic systems, data is typically transmitted and received in parallel, where each bit of data has its own dedicated wire or channel. However, as data rates increase and the demand for high-speed communication grows, it becomes more efficient to transmit data serially, using fewer wires. - A SERDES device takes parallel data input, serializes it (converts it into a serial stream), and transmits it over a serial link. On the receiving end, another SERDES device receives the serialized data and deserializes it (converts it back into parallel data). This process allows for the efficient transmission of large amounts of data over high-speed serial links. #### Block Diagram - We know that the frequency of io_clk is 4 times of `core_clk`. - `rxdata[3:0]` enabled every 4 `io_clk` or 1 `core_clk`. ![螢幕擷取畫面 2024-01-25 113207](https://hackmd.io/_uploads/ry5v8Ikqp.png) #### Clock and Reset - 3 clocks - rxclk/txclk (forwarding clock) - io_clk - core_clk **Forwarding clock:** - Souce Synchronous scheme for high performance ![螢幕擷取畫面 2024-01-25 112734](https://hackmd.io/_uploads/rymESU196.png =80%x) **Function of core_clk / io_clk** - The purpose of this module is to virtually increase the number of IO pins by rationing the core clock and io clock. In the following diagram, there are **`m * core_signals`** to IO, and there **`n * io_pins`**. To match its throughput, it needs to meet the equation: **`m * core_clk = n * io_clk`**. ![螢幕擷取畫面 2024-01-25 113106](https://hackmd.io/_uploads/rylLWU8y5a.png =80%x) #### Transmit / Receive Timing - Tx side output `serial_data` align with `txclk` **rising edge**. - Rx side use `rxclk` **negative edge** to sample the `serial_data` to parallel. ![螢幕擷取畫面 2024-01-25 113701](https://hackmd.io/_uploads/rysDPIk5T.png =70%x) #### Deserialize: Serial-In, Parallel-Out - Serial Data-in to RxFIFO -> use w_ptr - RxFIFO to `rx_shift_reg` -> use r_ptr - `rx_shift_reg` to `rx_data_out` - move when `(rx_shift_phase_cnt == 3)` - Static Timing Analysis Issues - Cross-clock domain: `rxclk = core_clk` - Source Synchronous: `rxclk` & data. - Using `core_clk` to sample 4-bit data, this is why we divide `io_clk` by 4. ![螢幕擷取畫面 2024-01-25 113855](https://hackmd.io/_uploads/H1bJu8J5T.png =70%x) #### Flow Control by **axis_switch** - Use `as_is_tready`, `is_as_tready` to do flow control - `as_is_tready`: when local side axis_switch `RxFIFO size <= threshold` then `as_is_tready = 0`, this flow control mechanism is for notifying remote not to provide data with `is_as_tvalid = 1`. - `is_as_tready`: when remote side axis_switch `RxFIFO size <= threshold` then `is_as_tready = 0`, this flow control mechanism is for notifying local side not to provide data with `as_is_tvalid = 1`. ![螢幕擷取畫面 2024-01-25 122252](https://hackmd.io/_uploads/HJyNMwkcp.png =95%x) ### Logic_Analyzer (LA) [Logic_Analyzer](https://github.com/bol-edu/fsic_fpga/blob/main/rtl/user/logic_analyzer) #### Block Diagram ![螢幕擷取畫面 2024-01-25 132852](https://hackmd.io/_uploads/BJVAbuy9a.png) #### Function 1. Monitor signals provided by user project. Support up to 24 monitoring signals ![螢幕擷取畫面 2024-01-26 025016](https://hackmd.io/_uploads/Syhdp7ec6.png =40%x) 3. Support signal conditioning to trigger signal logging (Currently done by host program) 4. Compress(Waveform compression, using Run-Length-Encoding RLE) the logged signals and sent them to remote users using the AXIS port. Waveform can be displayed in remote enviroments. - Note: Use waveform compression (RLE) with customized coding scheme to save data bandwidth and internal data buffers. - Notice that the Logic_Analyzer function is different from the Caravel LogicAnalyzer function (signals:la_xx). In Caravel, LogicAnalyzer signals are used for general-purpose GPIO signals controlled by RISC-V. #### Intruduction to RLE Run-Length Encoding (RLE) is a simple form of data compression that is used to represent consecutive sequences of identical elements in a more compact way. In RLE, instead of listing every individual element, you just count how many times the same element occurs consecutively and represent it as a single value followed by the count. - Here's a basic example: Original Sequence: ``` A A A B B C C C C D ``` RLE representation: ``` 3A 2B 4C 1D ``` - In this example, instead of writing out "A A A," we represent it as "3A." The same goes for consecutive occurrences of other elements. The numbers indicate how many times the corresponding element repeats consecutively. - RLE is particularly useful when dealing with data that has long sequences of repeated values, as it allows for more efficient storage or transmission of that data by reducing redundancy. #### Packet Format *Packet size is `32-bit = {RC[7:0], Data[23:0]}`* - RC(Repeat Count): Start at 1 and ends at 255. - If RC > 255, the packet is released. Start a new count. - Packet = 32'd0, indicate FIFO overflow because RC starts at 1. - Maximum transfer period - 8-bit = 256 count - if clock frequenct is 100MHz: 10ns * 256 = 2.5us/packet #### Configuration Register | Register Name | Offset Address | Description | | ------------- | -------------- | ----------- | | `ctrl` | 'h00 | bit[23:0] - the bitmap maps to `up_la_data[23:0]`. The 1'b1 means the corresponding `up_la_data` signal will be monitored. Logic Analyzer signal monitoring is disabled if all bit are 0. -------- Default is 24'h000000. | | `la_hprj_high_th` | 'h04 | bit[6:0] - Threshold to enable `la_hpti_reg` --- Default is 7'h40 | | `la_hprj_low_th` | 'h08 | bit[6:0] - Threshold to disable `la_hpti_reg` --- Default is 7'h10 | | `axis_pkt_len` | 'h0C | bit[6:0] - A group of bytes that are transported together across an AXI-Stream interface. ------ Default is 7'h08 | ### MAIL_BOX (MB) - Exchange message between Caravel SoC and FPGA - There is no MailBox module. It is simply registers embedded in axi_ctrl_logic.sv ``` verilog= logic [31:0] mb_regs [7:0]; // 32-bit * 8 ``` #### Interface Block ![螢幕擷取畫面 2024-01-25 133814](https://hackmd.io/_uploads/ryX0Xu19a.png =80%x) #### Functions - The mailbox is a set of registers which provides a communication channel between Caravel/RISC-V and FPGA/ARM. The operation mechanism is below: 1. The mailbox registers are duplicated in Caravel chip and in FPGA 2. When one side (either Caravel or FPGA) write to mailbox, the transaction is passed to other side. 3. When the mailbox is updated, an **interrupt** is generated to the other side. 4. The mailbox address is defined in the user address space in **32'h3300_0000 - 32'h33FF_FFFF**. #### Mailbox Operation | Operation | Caravel Side | FPGA Side | | --------- | ------------ | --------- | | Caravel Write | Updata mailbox, generate message cycle to FPGA | Update mailbox, generate interrupt to ARM | | Caravel Read | Return mailbox content | NOP | | FPGA Write | Update mailbox, generate interrupt to Caravel | Update mailbox, generate message cycle to Caravel | | FPGA Read | NOP | Return mail content | #### Configuration Registers of Mailbox - 8 * 32-bit - Mailbox address range: 15'h2000 - 15'h201F | Register Name | Offset Address | | ---------------- | -------------- | | `mb_reg_0[31:0]` | 'h0 | | `mb_reg_1[31:0]` | 'h4 | | `mb_reg_2[31:0]` | 'h8 | | `mb_reg_3[31:0]` | 'hC | | `mb_reg_4[31:0]` | 'h10 | | `mb_reg_5[31:0]` | 'h14 | | `mb_reg_6[31:0]` | 'h18 | | `mb_reg_7[31:0]` | 'h1C | ### User_subsys ![螢幕擷取畫面 2024-01-25 134204](https://hackmd.io/_uploads/S1xv2EOJ56.png =60%x) - We can integrate our design in PRJ1 - In Lab#1, we will integrate FIR into PRJ1([user_prj1](https://github.com/bol-edu/fsic_fpga/tree/main/rtl/user/user_subsys/user_prj/user_prj1)) - Use [user_prj1.v](https://github.com/bol-edu/fsic_fpga/blob/main/rtl/user/user_subsys/user_prj/user_prj1/rtl/user_prj1.v) as our module top interface - Define all our submodule in [rtl.f](https://github.com/bol-edu/fsic_fpga/blob/main/rtl/user/user_subsys/user_prj/user_prj1/rtl/rtl.f) - We should enable PRJ1 in Caravel SoC, in our firmware code: - Program `reg_fsic_cc = 32'h02` (3000_5000) - This will direct all ## Lab-fsic-sim: Lab Work - Can refer to the testbench [tb_fsic.v](https://github.com/TonyHo722/fsic_tony/blob/main/dsn/testbench/tb_fsic.v) in [fsic_tony](https://github.com/TonyHo722/fsic_tony/tree/main). - Implementation 1. Integrate FIR into PRJ1 (axilite, axi-stream in/out) 2. Testbench (modify [fsic_tb.v](https://github.com/bol-edu/fsic_fpga/blob/main/testbench/fsic/fsic_tb.v)) 1. Test#1 - FIR initialization (coefficient. length) from SoC side - Use Mailbox to notify FPGA side to start X, Y streaming 2. Test#2 - FIR initialization from FPGA side 3. FIR data X, Y stream data from FPGA side