# SoC Lab3 Verilog FIR [Github Link](https://github.com/dqrengg/SoC_Laboratory/tree/main/Lab3) ## FIR Design ### 1. Block Diagram ![blocks](https://hackmd.io/_uploads/rJ5VD2p21g.png) ### 2. FSM #### State diagram ![states](https://hackmd.io/_uploads/r1b3r0Th1g.png =50%x) ``` init_done = (init_addr == Tape_Num - 1); cfg_done = (awaddr == 12'h000) & awvalid & awready & (wdata[0] == 1'b1) & wvalid & wready; all_done = sm_tvalid && sm_tready && sm_tlast; ``` #### Logic controlled by FSM | Signals | INIT | WAIT | CALC | |---------|------|------|------| | `Tap_Di` | `rdata` | `rdata` | N/A | | `Tap_A` | `araddr` | `araddr` | `tap_addr` | | `Tap_EN` | `awvalid` & `wvalid` \| `arvalid` | `awvalid` & `wvalid` \| `arvalid` | `1` | | `Tap_WE` | `awvalid` & `wvalid` | `awvalid` & `wvalid` | N/A | | `Data_Di` | `0` | N/A | `ss_tdata` | | `Data_A` | `init_addr` | N/A | `data_addr` | | `Data_EN` | `1` | `0` | `1` | | `Data_WE` | `1` | `0` | `tap_addr == 0` | | | | | | | | | | | | | | | | ### 3. ap_* protocol ![ap](https://hackmd.io/_uploads/S1uzBLF2kx.png) #### ap_start 1. set by host, when programing `ap_start` = 1. 2. reset by engine, when the first `X` input is sampled. #### ap_done 3. set by engine, when the last `Y` is transferred. 4. reset by engine, when AXI reads 0x000. #### ap_idle 5. set by engine, when the first `X` input is sampled. 6. reset by engine, when the last `Y` is transferred. ### 4. FIR Core #### Pipeline stages 1. Address generation 2. SRAM access 3. multiplication 4. addition #### Input/Output control ##### X input * Tap address counts from `10` to `0` in every calculation. * Data address starts at `0`. While address = `9`, Data address goes back to `0` next cycle. * While Tap address is not `0`, the engine reads `x` from SRAM. * While Tap address = `0`, `X` input overwrites the previous data in SRAM and the engine reads `x` from bypass. * Without pointer to implement shift registers. ##### Y output * The engine passes Tap address to pipeline stages * When Tap address of the final stage (addition), `Y` output is ready. #### Pipeline stall ##### Stall conditions 1. When X doesn't come in and input buffer is empty. 2. When Y is not accepted and output buffer is full. ##### How to resolve stall Every stage keeps the data of last cycle. ### 5. AXI-Lite ![axilite](https://hackmd.io/_uploads/HkkfQ9p2kl.png) * For **read operation**, `arready` is always `1` unless under certain scenarios. After receiving `arvalid`, `rdata` will be validated next cycle. Before the host accepting `rdata`, `arready` will be de-asserted. * For **write operation**, the engine starts processing write request when both `awvalid` and `wvalid` are asserted, and asserts `awready` and `wready` next cycle. * When receiving both read and write requests at the same time, the engine will process the read request first. * If the current cycle has only a write request, the engine will process the write request next cycle. At this point, the engine cannot process a read request, so `arready` will be `0`. * During calculation, the engine ignores any write request, but returns the `awready` and `wready` as normal. * During calculation, the engine returns `32'hFFFF_FFFF` for reading Tap and a valid value for reading status. * `awaddr` and `rdata` are controlled by FSM output to handle invalid operations. ### 6. AXIS #### AXIS buffer * The buffers are placed between the engine and `X` input/`Y` output. * The buffers are designed as zero-delay buffers, passing the data through the bypass path while being empty. * Stall the pipeline while being full (Y buffer) or empty (X buffer). #### Master ![yb](https://hackmd.io/_uploads/SyVx58F3yx.png) #### Slave ![xb](https://hackmd.io/_uploads/SkWe9Itnke.png) ## Testbench ### 1. Cycle count * Total 6601 cycles for `data_length` = 600, 11 cycles per `Y` output. * To measure cycle count, AXIS has no latency. ### 2. AXI-Lite Tap configuration #### Valid configure * The host first writes all Tap into SRAM, then reads back to verify. ![tb_axi1](https://hackmd.io/_uploads/H1Z4Daahyx.png) * The host writes and reads back at the same time (write before read for the same address). * Short latency (0~3ns) ![tb_axi2](https://hackmd.io/_uploads/rJzhDTph1g.png) * Long latency (0~2 cycle) ![tb_axi3](https://hackmd.io/_uploads/SJtHdTThJg.png) #### Invalid configure * The host tries to write `Tap` to `0` during calculation, the engine should ignore it. * The host tries to read `Tap` during calculation, the engine should return `32'hFFFF_FFFF`. (only shows the long latency case) ![tb_axi4](https://hackmd.io/_uploads/rJ40Opp3yg.png) ### 3. AXIS * Check whether `Y` outputs are same as goldon. * No latency (11 cycle per `Y` output) ![tb_axis1](https://hackmd.io/_uploads/BJ7E2Ta2kx.png) * Short latency (11 cycles + 0~3ns) ![tb_axis2](https://hackmd.io/_uploads/S1qxhTT2kx.png) * Long latency (11 + 0~2 cycles) ![tb_axis3](https://hackmd.io/_uploads/ryha9aa3yg.png) ### 4. Status check * Host tries to write `ap_start` = 1, `ap_done` = 1 and `ap_idle` = 1 during after the engine started. The engine should ignore invalid configuration. ![tb_status1](https://hackmd.io/_uploads/ByGPzRp2Jl.png) * The host check the status after calculation completed. ![tb_status2](https://hackmd.io/_uploads/r1O6fAThkl.png) ## Reports ### 1. Timing * Clock period = 7 ns, freq = 142.857 MHz * Slack = 0.904 ns ``` --------------------------------------------------------------------------------------------------- From Clock: axis_clk To Clock: axis_clk Setup : 0 Failing Endpoints, Worst Slack 0.904ns, Total Violation 0.000ns Hold : 0 Failing Endpoints, Worst Slack 0.145ns, Total Violation 0.000ns PW : 0 Failing Endpoints, Worst Slack 3.000ns, Total Violation 0.000ns --------------------------------------------------------------------------------------------------- ``` * Max delay path: FSM output to X buffer ``` Max Delay Paths -------------------------------------------------------------------------------------- Slack (MET) : 0.904ns (required time - arrival time) Source: state_reg[2]/C (rising edge-triggered cell FDCE clocked by axis_clk {rise@0.000ns fall@3.500ns period=7.000ns}) Destination: x_buf_reg[0][0]/CE (rising edge-triggered cell FDCE clocked by axis_clk {rise@0.000ns fall@3.500ns period=7.000ns}) Path Group: axis_clk Path Type: Setup (Max at Slow Process Corner) Requirement: 7.000ns (axis_clk rise@7.000ns - axis_clk rise@0.000ns) Data Path Delay: 5.714ns (logic 1.269ns (22.209%) route 4.445ns (77.791%)) Logic Levels: 5 (LUT3=2 LUT5=2 LUT6=1) Clock Path Skew: -0.145ns (DCD - SCD + CPR) Destination Clock Delay (DCD): 2.128ns = ( 9.128 - 7.000 ) Source Clock Delay (SCD): 2.456ns Clock Pessimism Removal (CPR): 0.184ns Clock Uncertainty: 0.035ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE Total System Jitter (TSJ): 0.071ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.000ns Phase Error (PE): 0.000ns Location Delay type Incr(ns) Path(ns) Netlist Resource(s) ------------------------------------------------------------------- ------------------- (clock axis_clk rise edge) 0.000 0.000 r 0.000 0.000 r axis_clk (IN) net (fo=0) 0.000 0.000 axis_clk IBUF (Prop_ibuf_I_O) 0.972 0.972 r axis_clk_IBUF_inst/O net (fo=1, unplaced) 0.800 1.771 axis_clk_IBUF BUFG (Prop_bufg_I_O) 0.101 1.872 r axis_clk_IBUF_BUFG_inst/O net (fo=412, unplaced) 0.584 2.456 axis_clk_IBUF_BUFG FDCE r state_reg[2]/C ------------------------------------------------------------------- ------------------- FDCE (Prop_fdce_C_Q) 0.478 2.934 f state_reg[2]/Q net (fo=55, unplaced) 0.826 3.760 state[2] LUT3 (Prop_lut3_I0_O) 0.295 4.055 f ss_tready_OBUF_inst_i_4/O net (fo=23, unplaced) 1.174 5.229 ss_tready_OBUF_inst_i_4_n_0 LUT5 (Prop_lut5_I0_O) 0.124 5.353 f tap_addr_pre[9]_i_2/O net (fo=2, unplaced) 0.460 5.813 y_stall LUT6 (Prop_lut6_I5_O) 0.124 5.937 f tap_A_OBUF[11]_inst_i_2/O net (fo=26, unplaced) 0.968 6.905 stall LUT5 (Prop_lut5_I0_O) 0.124 7.029 r x_buf_wp[1]_i_2/O net (fo=4, unplaced) 0.473 7.502 x_buf_wp0 LUT3 (Prop_lut3_I1_O) 0.124 7.626 r x_buf[0][31]_i_1/O net (fo=32, unplaced) 0.544 8.170 x_buf[0][31]_i_1_n_0 FDCE r x_buf_reg[0][0]/CE ------------------------------------------------------------------- ------------------- (clock axis_clk rise edge) 7.000 7.000 r 0.000 7.000 r axis_clk (IN) net (fo=0) 0.000 7.000 axis_clk IBUF (Prop_ibuf_I_O) 0.838 7.838 r axis_clk_IBUF_inst/O net (fo=1, unplaced) 0.760 8.598 axis_clk_IBUF BUFG (Prop_bufg_I_O) 0.091 8.689 r axis_clk_IBUF_BUFG_inst/O net (fo=412, unplaced) 0.439 9.128 axis_clk_IBUF_BUFG FDCE r x_buf_reg[0][0]/C clock pessimism 0.184 9.311 clock uncertainty -0.035 9.276 FDCE (Setup_fdce_C_CE) -0.202 9.074 x_buf_reg[0][0] ------------------------------------------------------------------- required time 9.074 arrival time -8.170 ------------------------------------------------------------------- slack 0.904 ``` ### 2. Resources * LUT, FF ``` +-------------------------+------+-------+------------+-----------+-------+ | Site Type | Used | Fixed | Prohibited | Available | Util% | +-------------------------+------+-------+------------+-----------+-------+ | Slice LUTs* | 427 | 0 | 0 | 53200 | 0.80 | | LUT as Logic | 427 | 0 | 0 | 53200 | 0.80 | | LUT as Memory | 0 | 0 | 0 | 17400 | 0.00 | | Slice Registers | 412 | 0 | 0 | 106400 | 0.39 | | Register as Flip Flop | 412 | 0 | 0 | 106400 | 0.39 | | Register as Latch | 0 | 0 | 0 | 106400 | 0.00 | | F7 Muxes | 0 | 0 | 0 | 26600 | 0.00 | | F8 Muxes | 0 | 0 | 0 | 13300 | 0.00 | +-------------------------+------+-------+------------+-----------+-------+ ``` * BRAM ``` +----------------+------+-------+------------+-----------+-------+ | Site Type | Used | Fixed | Prohibited | Available | Util% | +----------------+------+-------+------------+-----------+-------+ | Block RAM Tile | 0 | 0 | 0 | 140 | 0.00 | | RAMB36/FIFO* | 0 | 0 | 0 | 140 | 0.00 | | RAMB18 | 0 | 0 | 0 | 280 | 0.00 | +----------------+------+-------+------------+-----------+-------+ ```