## Introduction
This report details a Verilog implementation of a Finite Impulse Response (FIR) filter module. The design is parameterized and integrates multiple interfaces including AXI-Lite for control and AXI-Stream for data transfer. Additionally, the module uses two BRAM interfaces for storing tap coefficients (tap RAM) and input samples (data RAM). The overall design is intended for FPGA-based digital signal processing applications.
## Module Parameters and I/O
### Parameters
* ***pADDR_WIDTH***: Specifies the address width. This parameter defines the addressing range for memory modules.
* ***pDATA_WIDTH***: Defines the data bus width, which influences the bit precision of both input samples and filter coefficients.
* ***Tape_Num***: Represents the number of taps (or coefficients) in the filter. In this design, it is parameterized to allow flexibility in filter length.
### Ports and Interfaces

#### AXI-Lite Interface
This interface is used for control and configuration:
* Read Path:
Inputs: ***arvalid, araddr, rready***
Outputs: ***arready, rvalid, rdata***
* Write Path:
Inputs: ***awvalid, awaddr, wvalid, wdata***
Outputs: ***awready, wready***
#### AXI-Stream Interface
This interface supports high-throughput data transactions:
* Input Stream (x[n]):
Inputs: ***ss_tvalid, ss_tdata, ss_tlast***
Output: ***ss_tready*** (to signal readiness to accept data)
* Output Stream (y[n]):
Input: ***sm_tready***
Outputs: ***sm_tvalid, sm_tdata, sm_tlast*** (with sm_tlast indicating the last sample)
#### BRAM Interfaces

The design utilizes two separate BRAMs:
* Tap RAM (Coefficient Storage):
Inputs: ***tap_WE, tap_EN, tap_Di, tap_A***
Output: ***tap_Do***
* Data RAM (Sample Storage):
Inputs: ***data_WE, data_EN, data_Di, data_A***
Output: ***data_Do***
#### Clock and Reset
***axis_clk***: The primary clock for synchronous operations.
***axis_rst_n***: Asynchronous active-low reset signal to initialize the module.
## Architecture
### FIR Processing FSM
```
always @* begin
case (fir_state)
FIR_IDLE: fir_state_next = (ap_start == 1) ? DATA_RST : FIR_IDLE;
DATA_RST: begin
if (dataRam_rst_cnt == tap_number) begin
fir_state_next = FIR_WAIT;
end else begin
fir_state_next = DATA_RST;
end
end
FIR_WAIT: fir_state_next = (x_buffer_count == 1) ? FIR_SSIN : FIR_WAIT;
FIR_SSIN: fir_state_next = FIR_RUN;
FIR_RUN: fir_state_next = (k == tap_number) ? FIR_CAL : FIR_RUN;
FIR_CAL: begin
if (y_buffer_count == 0 && x_buffer_count == 1) begin
if (data_count[pDATA_WIDTH-1:0] == data_length[pDATA_WIDTH-1:0]) fir_state_next = FIR_IDLE;
else fir_state_next = FIR_SSIN;
end else if (y_buffer_count == 0 && x_buffer_count == 0) begin
if (data_count[pDATA_WIDTH-1:0] == data_length[pDATA_WIDTH-1:0]) fir_state_next = FIR_IDLE;
else fir_state_next = FIR_WAIT;
end
end
FIR_OUT: begin
if (sm_tready && sm_tvalid) begin
if (data_count[pDATA_WIDTH-1:0] == data_length[pDATA_WIDTH-1:0]) fir_state_next = FIR_IDLE;
else if (x_buffer_count == 1) fir_state_next = FIR_SSIN;
else fir_state_next = FIR_WAIT;
end
else fir_state_next = FIR_OUT;
end
default: fir_state_next = fir_state;
endcase
end
```
The core FIR operation is managed by a dedicated FSM with several states:
***FIR_IDLE:***
The default state where the module waits for a start command.
***DATA_RST:***
Clears the data RAM (resets samples) by writing zeros sequentially.
***FIR_WAIT:***
Waits for valid input data via the AXI-Stream interface (ss_tvalid).
***FIR_SSIN:***
Accepts streaming data and asserts ss_tready to store the incoming sample.
***FIR_STORE:***
After capturing a sample, it moves to store the data in the data RAM.
***FIR_RUN:***
Reads from both tap and data RAM to perform the multiply-accumulate (MAC) operations. A counter (k) iterates over the number of taps.
***FIR_CAL:***
The final multiply-accumulate computations are completed, using temporary registers to accumulate the result.
***FIR_OUT:***
Once computation is complete, the result (y) is sent out via the AXI-Stream output interface. The module checks for the last sample flag (last_flg) to determine whether to return to an idle state.
### Multiply-Accumulate Operation
```
always @(posedge axis_clk or negedge axis_rst_n) begin
if (~axis_rst_n) begin
y <= 0;
m <= 0;
h <= 0;
a <= 0;
end else if (fir_state == FIR_SSIN) begin
y <= 0;
m <= m_next;
h <= h_next;
a <= a_next;
end else begin
y <= y_next;
m <= m_next;
h <= h_next;
a <= a_next;
end
end
always @* begin
y_next = y + m;
m_next = h * a;
if (fir_state == FIR_SSIN) begin
a_next = x_buffer;
h_next = tap_Do;
end else if (fir_state == FIR_RUN || fir_state == FIR_CAL) begin
a_next = data_Do;
h_next = tap_Do;
end else begin
a_next = 0;
h_next = 0;
end
end
```
***Intermediate Registers:***
***h***: Holds the current tap coefficient fetched from tap RAM.
***a***: Contains the current input sample from data RAM.
***m***: The product of h and a, representing the contribution of the current tap.
***y***: Accumulates the overall output value as the sum of products across all taps.
***Calculation Flow:***
In the FIR_CAL state, the following computations occur:
1. ***m_next*** = h * a calculates the current contribution.
2. ***y_next*** = y + m accumulates the result.
3. Registers update in the subsequent clock cycle, ensuring pipelined computation through the MAC chain.
## Performance
By adding x_buffer and y_buffer, FIR only spends N + 1 cycles to get the results if data_length is N.
```
// x buffer
always @(posedge axis_clk) begin
if (~axis_rst_n) begin
x_buffer <= 0;
x_buffer_count <= 0;
end else if (ss_tready && ss_tvalid) begin
x_buffer <= ss_tdata;
x_buffer_count <= 1;
end else if (fir_state == FIR_SSIN) begin
x_buffer <= 0;
x_buffer_count <= 0;
end else begin
x_buffer = x_buffer;
x_buffer_count <= x_buffer_count;
end
end
// y buffer
always @(posedge axis_clk) begin
if (~axis_rst_n) begin
y_buffer <= 0;
y_buffer_count <= 0;
end else if (fir_state == FIR_CAL && y_buffer_count == 0) begin
y_buffer <= y_next;
y_buffer_count <= 1;
end else if (fir_state == FIR_OUT && y_buffer_count == 1) begin
if (sm_tvalid && sm_tready) begin
y_buffer <= y;
y_buffer_count <= 1;
end else begin
y_buffer <= y_buffer;
y_buffer_count <= y_buffer_count;
end
end else if (sm_tvalid && sm_tready) begin
y_buffer <= 0;
y_buffer_count <= 0;
end else begin
y_buffer <= y_buffer;
y_buffer_count <= y_buffer_count;
end
end
```
## Conclusion
This FIR filter module is a comprehensive design that integrates configuration, memory management, and high-speed data processing into a single Verilog implementation. The use of AXI-Lite and AXI-Stream interfaces allows for flexible control and real-time data flow, while dual BRAM interfaces ensure efficient storage and retrieval of filter coefficients and input samples.