## Introduction This report details a Verilog implementation of a Finite Impulse Response (FIR) filter module. The design is parameterized and integrates multiple interfaces including AXI-Lite for control and AXI-Stream for data transfer. Additionally, the module uses two BRAM interfaces for storing tap coefficients (tap RAM) and input samples (data RAM). The overall design is intended for FPGA-based digital signal processing applications. ## Module Parameters and I/O ### Parameters * ***pADDR_WIDTH***: Specifies the address width. This parameter defines the addressing range for memory modules. * ***pDATA_WIDTH***: Defines the data bus width, which influences the bit precision of both input samples and filter coefficients. * ***Tape_Num***: Represents the number of taps (or coefficients) in the filter. In this design, it is parameterized to allow flexibility in filter length. ### Ports and Interfaces ![IMG_0638](https://hackmd.io/_uploads/BkDkEPpQex.png) #### AXI-Lite Interface This interface is used for control and configuration: * Read Path: Inputs: ***arvalid, araddr, rready*** Outputs: ***arready, rvalid, rdata*** * Write Path: Inputs: ***awvalid, awaddr, wvalid, wdata*** Outputs: ***awready, wready*** #### AXI-Stream Interface This interface supports high-throughput data transactions: * Input Stream (x[n]): Inputs: ***ss_tvalid, ss_tdata, ss_tlast*** Output: ***ss_tready*** (to signal readiness to accept data) * Output Stream (y[n]): Input: ***sm_tready*** Outputs: ***sm_tvalid, sm_tdata, sm_tlast*** (with sm_tlast indicating the last sample) #### BRAM Interfaces ![IMG_0640](https://hackmd.io/_uploads/S14pF56Qle.jpg) The design utilizes two separate BRAMs: * Tap RAM (Coefficient Storage): Inputs: ***tap_WE, tap_EN, tap_Di, tap_A*** Output: ***tap_Do*** * Data RAM (Sample Storage): Inputs: ***data_WE, data_EN, data_Di, data_A*** Output: ***data_Do*** #### Clock and Reset ***axis_clk***: The primary clock for synchronous operations. ***axis_rst_n***: Asynchronous active-low reset signal to initialize the module. ## Architecture ### FIR Processing FSM ``` always @* begin case (fir_state) FIR_IDLE: fir_state_next = (ap_start == 1) ? DATA_RST : FIR_IDLE; DATA_RST: begin if (dataRam_rst_cnt == tap_number) begin fir_state_next = FIR_WAIT; end else begin fir_state_next = DATA_RST; end end FIR_WAIT: fir_state_next = (x_buffer_count == 1) ? FIR_SSIN : FIR_WAIT; FIR_SSIN: fir_state_next = FIR_RUN; FIR_RUN: fir_state_next = (k == tap_number) ? FIR_CAL : FIR_RUN; FIR_CAL: begin if (y_buffer_count == 0 && x_buffer_count == 1) begin if (data_count[pDATA_WIDTH-1:0] == data_length[pDATA_WIDTH-1:0]) fir_state_next = FIR_IDLE; else fir_state_next = FIR_SSIN; end else if (y_buffer_count == 0 && x_buffer_count == 0) begin if (data_count[pDATA_WIDTH-1:0] == data_length[pDATA_WIDTH-1:0]) fir_state_next = FIR_IDLE; else fir_state_next = FIR_WAIT; end end FIR_OUT: begin if (sm_tready && sm_tvalid) begin if (data_count[pDATA_WIDTH-1:0] == data_length[pDATA_WIDTH-1:0]) fir_state_next = FIR_IDLE; else if (x_buffer_count == 1) fir_state_next = FIR_SSIN; else fir_state_next = FIR_WAIT; end else fir_state_next = FIR_OUT; end default: fir_state_next = fir_state; endcase end ``` The core FIR operation is managed by a dedicated FSM with several states: ***FIR_IDLE:*** The default state where the module waits for a start command. ***DATA_RST:*** Clears the data RAM (resets samples) by writing zeros sequentially. ***FIR_WAIT:*** Waits for valid input data via the AXI-Stream interface (ss_tvalid). ***FIR_SSIN:*** Accepts streaming data and asserts ss_tready to store the incoming sample. ***FIR_STORE:*** After capturing a sample, it moves to store the data in the data RAM. ***FIR_RUN:*** Reads from both tap and data RAM to perform the multiply-accumulate (MAC) operations. A counter (k) iterates over the number of taps. ***FIR_CAL:*** The final multiply-accumulate computations are completed, using temporary registers to accumulate the result. ***FIR_OUT:*** Once computation is complete, the result (y) is sent out via the AXI-Stream output interface. The module checks for the last sample flag (last_flg) to determine whether to return to an idle state. ### Multiply-Accumulate Operation ``` always @(posedge axis_clk or negedge axis_rst_n) begin if (~axis_rst_n) begin y <= 0; m <= 0; h <= 0; a <= 0; end else if (fir_state == FIR_SSIN) begin y <= 0; m <= m_next; h <= h_next; a <= a_next; end else begin y <= y_next; m <= m_next; h <= h_next; a <= a_next; end end always @* begin y_next = y + m; m_next = h * a; if (fir_state == FIR_SSIN) begin a_next = x_buffer; h_next = tap_Do; end else if (fir_state == FIR_RUN || fir_state == FIR_CAL) begin a_next = data_Do; h_next = tap_Do; end else begin a_next = 0; h_next = 0; end end ``` ***Intermediate Registers:*** ***h***: Holds the current tap coefficient fetched from tap RAM. ***a***: Contains the current input sample from data RAM. ***m***: The product of h and a, representing the contribution of the current tap. ***y***: Accumulates the overall output value as the sum of products across all taps. ***Calculation Flow:*** In the FIR_CAL state, the following computations occur: 1. ***m_next*** = h * a calculates the current contribution. 2. ***y_next*** = y + m accumulates the result. 3. Registers update in the subsequent clock cycle, ensuring pipelined computation through the MAC chain. ## Performance By adding x_buffer and y_buffer, FIR only spends N + 1 cycles to get the results if data_length is N. ``` // x buffer always @(posedge axis_clk) begin if (~axis_rst_n) begin x_buffer <= 0; x_buffer_count <= 0; end else if (ss_tready && ss_tvalid) begin x_buffer <= ss_tdata; x_buffer_count <= 1; end else if (fir_state == FIR_SSIN) begin x_buffer <= 0; x_buffer_count <= 0; end else begin x_buffer = x_buffer; x_buffer_count <= x_buffer_count; end end // y buffer always @(posedge axis_clk) begin if (~axis_rst_n) begin y_buffer <= 0; y_buffer_count <= 0; end else if (fir_state == FIR_CAL && y_buffer_count == 0) begin y_buffer <= y_next; y_buffer_count <= 1; end else if (fir_state == FIR_OUT && y_buffer_count == 1) begin if (sm_tvalid && sm_tready) begin y_buffer <= y; y_buffer_count <= 1; end else begin y_buffer <= y_buffer; y_buffer_count <= y_buffer_count; end end else if (sm_tvalid && sm_tready) begin y_buffer <= 0; y_buffer_count <= 0; end else begin y_buffer <= y_buffer; y_buffer_count <= y_buffer_count; end end ``` ## Conclusion This FIR filter module is a comprehensive design that integrates configuration, memory management, and high-speed data processing into a single Verilog implementation. The use of AXI-Lite and AXI-Stream interfaces allows for flexible control and real-time data flow, while dual BRAM interfaces ensure efficient storage and retrieval of filter coefficients and input samples.