# Lab3_FIR updated [Reference: lab3_ver2 design](https://github.com/Raywang908/lab3_ver2) ## Design SPEC * data_width = max 32 bit * tap_width = max 32 bit * addr_width = max 12 bit * tap_number = max 32 unit * RAM_addr_width = max 12 bit * RAM_capacity = data_width * 32 space * addr 0x00: for [2] ap_idle, [1] ap_done, [0] ap_start * addr 0x10: data_length * addr 0x14: tap_length * addr 0x80 ~ 0xFF: tap, data ## Design Approach ### Axi_Lite waveform ![lab3_axi_lite](https://hackmd.io/_uploads/BJecV3pkxl.png) This time diagram is correct and some further details can reference the 1st version of this part. ### Axi_Lite block diagram ![image](https://hackmd.io/_uploads/r1EtHn6yel.png) This part is refine to some simple logic and I don't use FSM in this version. ### Axi_Stream waveform 1st ![lab3_axi_stream_ver1](https://hackmd.io/_uploads/ByKQ83a1gx.png) The concept of this waveform is similar with the version 1, I did not decouple the input x and the core engine. So, if I design it based on this idea, it will result in poor performance in Lab 4, because the CPU will have to wait for the core engine to finish the task before it can start another one. ### Axi_Stream waveform 2nd ![lab3_axi_stream_ver2_ult](https://hackmd.io/_uploads/By2Iw2ayee.png) This version adds x_buffer and y_buffer to decouple the input/output Bus and the engine, so this two system can do its own task. :::warning This waveform is not totally correct, but it can be used as an insight for the design. ::: :::success Although drawing waveforms before coding may take a lot of time and effort, it can give you a clear idea of the design details, so you won't need to revise your code frequently. However, the downside is that when the design becomes complicated, redrawing the waveforms can be very time-consuming. ::: ### Axi_Stream block diagram ![image](https://hackmd.io/_uploads/SyGqt3pygx.png) ### Overview block diagram ![image](https://hackmd.io/_uploads/ryU6YhTylg.png) ## 1st version Code ```verilog module fir #( parameter pADDR_WIDTH = 12, parameter pDATA_WIDTH = 32, parameter Tape_Num = 12 ) ( output wire awready, output wire wready, input wire awvalid, input wire [(pADDR_WIDTH-1):0] awaddr, input wire wvalid, input wire [(pDATA_WIDTH-1):0] wdata, output wire arready, input wire rready, input wire arvalid, input wire [(pADDR_WIDTH-1):0] araddr, output wire rvalid, output wire [(pDATA_WIDTH-1):0] rdata, input wire ss_tvalid, input wire [(pDATA_WIDTH-1):0] ss_tdata, input wire ss_tlast, output wire ss_tready, input wire sm_tready, output wire sm_tvalid, output wire [(pDATA_WIDTH-1):0] sm_tdata, output wire sm_tlast, // bram for tap RAM output wire [3:0] tap_WE, output wire tap_EN, output wire [(pDATA_WIDTH-1):0] tap_Di, output wire [(pADDR_WIDTH-1):0] tap_A, input wire [(pDATA_WIDTH-1):0] tap_Do, // bram for data RAM output wire [3:0] data_WE, output wire data_EN, output wire [(pDATA_WIDTH-1):0] data_Di, output wire [(pADDR_WIDTH-1):0] data_A, input wire [(pDATA_WIDTH-1):0] data_Do, input wire axis_clk, input wire axis_rst_n ); //========================== Declaration ========================== //-------------------------- Axi-Lite ------------------------------- reg [(pADDR_WIDTH-1):0] araddr_tmp; reg [(pADDR_WIDTH-1):0] araddr_next; reg wready_tmp; reg wready_next; reg awready_tmp; reg awready_next; reg arready_tmp; reg arready_next; reg rvalid_tmp; reg rvalid_next; reg [(pDATA_WIDTH-1):0] rdata_tmp; wire [4:0] condition; // used as the condition of the mux for rdata reg [1:0] addr_define; // define address of the three register below into 2 bits localparam [(pADDR_WIDTH-1):0] AP_ADDR = {{(pADDR_WIDTH-8){1'b0}}, 8'h00}; // address of ap_crtl register localparam [(pADDR_WIDTH-1):0] DATA_ADDR = {{(pADDR_WIDTH-8){1'b0}}, 8'h10}; // address of data_length register localparam [(pADDR_WIDTH-1):0] TAP_ADDR = {{(pADDR_WIDTH-8){1'b0}}, 8'h14}; // address of tap_length register localparam [(pADDR_WIDTH-1):0] INVALID_ADDR = {{(pADDR_WIDTH-1){1'b0}}, 1'b1}; //return invalid address ex. 12'h01 localparam [(pDATA_WIDTH-1):0] INVALID_DATA = {(pDATA_WIDTH){1'b1}}; // return invalid number ex. 32'hffffffff reg [(pDATA_WIDTH-1):0] data_length; reg [(pDATA_WIDTH-1):0] data_length_next; reg [(pDATA_WIDTH-1):0] tap_length; reg [(pDATA_WIDTH-1):0] tap_length_next; //-------------------------- ap_idle & ap_done & ap_start ------------------------------- reg ap_idle; reg ap_idle_next; reg ap_done; reg ap_done_next; wire ap_start; wire [2:0] ap_crtl; localparam VALID = 1'b1; localparam PULLD = 1'b0 ; // pull down = 1'b0 localparam PULLU = 1'b1 ; // pull up = 1'b1 //-------------------------- Axi-Stream SS (input X) ------------------------------- reg data_filled; wire data_filled_next; reg [(pDATA_WIDTH-1):0] data_cnt_now; reg [(pDATA_WIDTH-1):0] data_cnt_now_next; // we use this in the time diagram reg [(pDATA_WIDTH-1):0] x_buffer; reg [(pDATA_WIDTH-1):0] x_buffer_next; // we use this in the time diagram reg ss_tdone; wire ss_tdone_next; reg ss_tready_tmp; reg ss_tready_next; //-------------------------- FIR engine ------------------------------- reg [(pDATA_WIDTH-1):0] data_cnt_state; reg [(pDATA_WIDTH-1):0] data_cnt_state_next; // we use this in the time diagram reg [(pDATA_WIDTH-1):0] data_cnt_norm; reg [(pDATA_WIDTH-1):0] data_cnt_norm_next; // we use this in the time diagram reg [(pADDR_WIDTH-1):0] tapA_cnter; reg [(pADDR_WIDTH-1):0] tapA_cnter_next; // we use this in the time diagram reg [(pADDR_WIDTH-1):0] dataA_cnter; reg [(pADDR_WIDTH-1):0] dataA_cnter_next; // we use this in the time diagram wire [(pDATA_WIDTH-1):0] data_A_tmp; localparam [(pADDR_WIDTH-1):0] CNTERA_INVALID = 32; //max tap num reg data_write; reg write_sr; // data_write is shift right one cycle reg write_sr_next; reg [(pADDR_WIDTH-1):0] op_cnter1; wire [(pADDR_WIDTH-1):0] op_cnter1_next; reg [(pADDR_WIDTH-1):0] op_cnter2; wire [(pADDR_WIDTH-1):0] op_cnter2_next; reg [(pADDR_WIDTH-1):0] op_cnter3; // we use this in the time diagram wire [(pADDR_WIDTH-1):0] op_cnter3_next; reg [(pDATA_WIDTH-1):0] muli; reg [(pDATA_WIDTH-1):0] muli_next; reg mul_ctrl1; wire mul_ctrl1_next; reg mul_ctrl2; wire mul_ctrl2_next; reg sum_ctrl; wire sum_ctrl_next; reg [(pDATA_WIDTH-1):0] xi; wire [(pDATA_WIDTH-1):0] xi_next; reg [(pDATA_WIDTH-1):0] tapi; wire [(pDATA_WIDTH-1):0] tapi_next; reg [(pDATA_WIDTH-1):0] sumi; reg [(pDATA_WIDTH-1):0] sumi_next; reg [(pDATA_WIDTH-1):0] y_storage; reg [(pDATA_WIDTH-1):0] y_storage_next; reg [(pDATA_WIDTH-1):0] y_buffer; reg [(pDATA_WIDTH-1):0] y_buffer_next; reg y_locked; reg y_locked_next; reg y_change_valid; wire y_change_valid_next; reg sm_tdone; wire sm_tdone_next; //-------------------------- Axi-Stream SM (Output Y) ------------------------------- reg [(pDATA_WIDTH-1):0] y_cnter; reg [(pDATA_WIDTH-1):0] y_cnter_next; reg sm_tvalid_tmp; reg sm_tvalid_next; reg sm_tlast_tmp; reg sm_tlast_next; //========================== Function ========================== //-------------------------- Axi-Lite ------------------------------- // assign response signals assign wready = wready_tmp; assign awready = awready_tmp; assign arready = arready_tmp; assign rvalid = rvalid_tmp; // assign tap_RAM ouput signals assign tap_WE = (ap_idle && awready && wready) ? 4'b1111 : 4'b0000; assign tap_EN = (ap_idle) ? ((wready && awaddr[7]) || (rvalid && araddr_tmp[7])) : ((|data_cnt_state_next) || ss_tdone); // (|data_cnt_state) == !(data_cnt_state == 0) assign tap_Di = wdata; assign tap_A = (ap_idle) ? ((awvalid) ? awaddr[6:0] : araddr_tmp[6:0]) : (tapA_cnter_next << 2); always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin data_length <= 0; tap_length <= 0; wready_tmp <= 0; awready_tmp <= 0; arready_tmp <= 0; rvalid_tmp <= 0; araddr_tmp <= 0; end else begin data_length <= data_length_next; tap_length <= tap_length_next; wready_tmp <= wready_next; awready_tmp <= awready_next; arready_tmp <= arready_next; rvalid_tmp <= rvalid_next; araddr_tmp <= araddr_next; end end always @(*) begin // managing wready and awready if (awvalid && wvalid && !awready) begin wready_next = PULLU; awready_next = PULLU; end else begin wready_next = PULLD; awready_next = PULLD; end // managing arready if (arvalid && !arready) begin arready_next = PULLU; end else begin arready_next = PULLD; end // managing rvalid if (arready) begin rvalid_next = PULLU; end else if (rready) begin rvalid_next = PULLD; end else begin rvalid_next = rvalid_tmp; end // managing araddr_tmp (store the value of araddr until shakehand) if (arvalid) begin araddr_next = araddr; end else if (rvalid && rready) begin araddr_next = INVALID_ADDR; end else begin araddr_next = araddr_tmp; end end // define address of the three register (ap_crtl, data_length, tap_length) into 2 bits // which is used in "condition" always @(*) begin if (araddr_tmp == AP_ADDR) begin addr_define = 2'b11; end else if (araddr_tmp == DATA_ADDR) begin addr_define = 2'b01; end else begin addr_define = 2'b10; end end assign condition = {araddr_tmp[7], ap_idle, rready && rvalid, addr_define}; always @(*) begin casez (condition) 5'b111??: begin rdata_tmp = tap_Do; end 5'b101??: begin rdata_tmp = INVALID_DATA; end 5'b0?101: begin rdata_tmp = data_length; end 5'b0?110: begin rdata_tmp = tap_length; end 5'b0?111: begin rdata_tmp = ap_crtl; end default: begin rdata_tmp = {(pDATA_WIDTH){1'b0}}; end endcase end assign rdata = rdata_tmp; always @(*) begin // define the flipflop of data_length if ((|tap_WE) && (awaddr == DATA_ADDR)) begin data_length_next = wdata; end else begin data_length_next = data_length; end // define the flipflop of tap_length if ((|tap_WE) && (awaddr == TAP_ADDR)) begin tap_length_next = wdata; end else begin tap_length_next = tap_length; end end //-------------------------- ap_idle & ap_done & ap_start ------------------------------- assign ap_crtl = {ap_idle, ap_done, ap_start}; // define the write in of ap_start assign ap_start = ((|tap_WE) && (awaddr == AP_ADDR)) ? wdata[0] : PULLD; always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin ap_idle <= 1; ap_done <= 0; end else begin ap_idle <= ap_idle_next; ap_done <= ap_done_next; end end always @(*) begin // define the flipflop of ap_idle if (ap_start) begin ap_idle_next = PULLD; end else if (sm_tlast && sm_tready) begin ap_idle_next = PULLU; end else begin ap_idle_next = ap_idle; end // define the flipflop of ap_done if (sm_tlast && sm_tready) begin ap_done_next = PULLU; end else if (ap_idle && ap_done && (araddr_tmp == AP_ADDR) && rready && rvalid) begin ap_done_next = PULLD; end else begin ap_done_next = ap_done; end end //-------------------------- Axi-Stream SS (input X) ------------------------------- always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin data_cnt_now <= 0; x_buffer <= INVALID_DATA; data_filled <= 0; ss_tdone <= 0; ss_tready_tmp <= 0; end else begin data_cnt_now <= data_cnt_now_next; x_buffer <= x_buffer_next; data_filled <= data_filled_next; ss_tdone <= ss_tdone_next; ss_tready_tmp <= ss_tready_next; end end assign ss_tready = ss_tready_tmp; // used as a switch to turn off data_EN and tap_EN assign ss_tdone_next = (ss_tready && ss_tlast) ? PULLU : (sm_tready && sm_tlast) ? PULLD : ss_tdone; // data_filled = 1 if data_RAM is filled assign data_filled_next = (sm_tdone) ? PULLD : (data_cnt_now_next == tap_length) && !ap_idle ? PULLU : data_filled; // define data_cnt_now_next and x_buffer_next always @(*) begin if (ap_idle) begin data_cnt_now_next = {(pDATA_WIDTH-1){1'b0}}; x_buffer_next = INVALID_DATA; end else if (ss_tvalid && ss_tready) begin data_cnt_now_next = data_cnt_now + 1; x_buffer_next = ss_tdata; end else if (sm_tdone) begin // ss_tdone data_cnt_now_next = {(pDATA_WIDTH-1){1'b0}}; x_buffer_next = INVALID_DATA; end else begin data_cnt_now_next = data_cnt_now; x_buffer_next = x_buffer; end end // define ss_tready_next always @(*) begin if (!data_filled) begin if (ss_tvalid && !ss_tready && !ap_idle) begin ss_tready_next = PULLU; end else begin ss_tready_next = PULLD; end end else if ((data_cnt_now_next == data_cnt_state_next) && ss_tvalid && !ss_tready && !ap_idle && !ss_tdone) begin ss_tready_next = PULLU; end else begin ss_tready_next = PULLD; end end //-------------------------- FIR engine ------------------------------- always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin data_cnt_state <= 0; data_cnt_norm <= 0; tapA_cnter <= CNTERA_INVALID; dataA_cnter <= CNTERA_INVALID; write_sr <= 0; op_cnter1 <= CNTERA_INVALID; op_cnter2 <= CNTERA_INVALID; op_cnter3 <= CNTERA_INVALID; mul_ctrl1 <= 0; mul_ctrl2 <= 0; xi <= INVALID_DATA; tapi <= INVALID_DATA; sum_ctrl <= 0; sumi <= 0; muli <= 0; y_locked <= 0; y_storage <= INVALID_DATA; y_change_valid <= 0; y_buffer <= INVALID_DATA; sm_tdone <= 0; end else begin data_cnt_state <= data_cnt_state_next; data_cnt_norm <= data_cnt_norm_next; tapA_cnter <= tapA_cnter_next; dataA_cnter <= dataA_cnter_next; write_sr <= write_sr_next; op_cnter1 <= op_cnter1_next; op_cnter2 <= op_cnter2_next; op_cnter3 <= op_cnter3_next; mul_ctrl1 <= mul_ctrl1_next; mul_ctrl2 <= mul_ctrl2_next; xi <= xi_next; tapi <= tapi_next; sum_ctrl <= sum_ctrl_next; sumi <= sumi_next; muli <= muli_next; y_locked <= y_locked_next; y_storage <= y_storage_next; y_change_valid <= y_change_valid_next; y_buffer <= y_buffer_next; sm_tdone <= sm_tdone_next; end end // define data_cnt_state_next always @(*) begin if (ap_idle) begin // oridinally no ap_idle data_cnt_state_next = 0; end else if ((data_cnt_now_next == 1) && (data_cnt_now == 0)) begin data_cnt_state_next = 1; end else if (tapA_cnter == 1 && !(y_locked || y_locked_next) && (data_cnt_state < data_cnt_now_next)) begin data_cnt_state_next = data_cnt_state + 1; end else begin data_cnt_state_next = data_cnt_state; end end // define data_cnt_norm_next, which is the loop of 1 to tap_lentgh - 1 always @(*) begin if (ap_idle) begin // sm_tdone data_cnt_norm_next = 0; end else if (!(data_cnt_state == data_cnt_state_next)) begin if (data_cnt_norm == tap_length) begin data_cnt_norm_next = 1; end else begin data_cnt_norm_next = data_cnt_norm + 1; end end else begin data_cnt_norm_next = data_cnt_norm; end end // define tapA_cnter_next always @(*) begin if (ap_idle) begin // sm_tdone tapA_cnter_next = CNTERA_INVALID; end else if (data_cnt_state_next == 0) begin tapA_cnter_next = tapA_cnter; end else if (!(data_cnt_state == data_cnt_state_next)) begin tapA_cnter_next = 0; end else if (!data_write && (tapA_cnter == 0)) begin if (data_cnt_state_next < tap_length) begin tapA_cnter_next = data_cnt_state_next; end else begin tapA_cnter_next = tap_length - 1; end end else if (tapA_cnter > 1 && !data_write) begin //only tapA_cnter > 1 tapA_cnter_next = tapA_cnter - 1; end else begin tapA_cnter_next = tapA_cnter; end end // define dataA_cnter_next always @(*) begin if (ap_idle) begin // ss_tdone dataA_cnter_next = CNTERA_INVALID; end else if (data_cnt_state_next == 0) begin dataA_cnter_next = dataA_cnter; end else if (!(data_cnt_state == data_cnt_state_next)) begin dataA_cnter_next = data_cnt_norm_next - 1; end else if (!(tapA_cnter_next == tapA_cnter) && !data_write) begin if (data_cnt_state_next < tap_length) begin if (tapA_cnter == 0) begin dataA_cnter_next = 0; end else begin dataA_cnter_next = dataA_cnter + 1; end end else begin if (tapA_cnter == 0) begin if (data_cnt_state_next % tap_length + 1 == tap_length) begin dataA_cnter_next = 0; end else begin dataA_cnter_next = data_cnt_state_next % tap_length + 1; end end else begin if (dataA_cnter + 1 == tap_length) begin dataA_cnter_next = 0; end else begin dataA_cnter_next = dataA_cnter + 1; end end end end else begin dataA_cnter_next = dataA_cnter; end end // define write_sr_next always @(*) begin if (!(data_cnt_state == data_cnt_state_next) && !(data_cnt_state_next == data_cnt_now_next) && ss_tready) begin write_sr_next = PULLU; end else begin write_sr_next = PULLD; end end // define data_write always @(*) begin if (!data_filled) begin if (ss_tready && (write_sr_next == PULLU)) begin data_write = PULLD; end else if (write_sr) begin data_write = PULLU; end else begin data_write = ss_tready; end end else if ((data_cnt_state_next > tap_length) && !(data_cnt_state == data_cnt_state_next)) begin data_write = PULLU; end else begin data_write = PULLD; end end // assign data_RAM ouput signals assign data_WE = {(4){data_write}}; assign data_EN = (|data_cnt_state_next) || ss_tdone; assign data_Di = x_buffer_next; assign data_A_tmp = (data_write && !data_filled) ? (data_cnt_now_next - 1) << 2 : dataA_cnter_next << 2; assign data_A = data_A_tmp[(pADDR_WIDTH-1):0]; // bcs data_cnt_now_next is [(pDATA_WIDTH-1):0] // assign xi and tapi for pipelining assign xi_next = data_Do; assign tapi_next = tap_Do; // assign op_cnter_next assign op_cnter1_next = tapA_cnter; assign op_cnter2_next = op_cnter1; assign op_cnter3_next = op_cnter2; // define mul_ctrl assign mul_ctrl1_next = !(tapA_cnter_next == tapA_cnter) ? PULLU : PULLD; assign mul_ctrl2_next = mul_ctrl1; // define sum_ctrl assign sum_ctrl_next = mul_ctrl2; // manage muli: do multiply operation always @(*) begin if (mul_ctrl2) begin muli_next = xi * tapi; end else begin muli_next = muli; end end // manage sumi: do sum operation always @(*) begin if (sum_ctrl && (op_cnter3 == CNTERA_INVALID)) begin sumi_next = muli; end else if (sum_ctrl && (op_cnter3 == 0)) begin sumi_next = muli; end else if (sum_ctrl) begin sumi_next = muli + sumi; end else begin sumi_next = sumi; end end // define y_locked always @(*) begin if (op_cnter2 == 1 && op_cnter1 == 0 && sm_tvalid && !sm_tready) begin y_locked_next = PULLU; end else if (sm_tready) begin y_locked_next = PULLD; end else begin y_locked_next = y_locked; end end // assign sm_tdone assign sm_tdone_next = sm_tlast && sm_tready && ~sm_tdone; // define y_storage always @(*) begin if (op_cnter3 == 1 && op_cnter2 == 0 && y_locked) begin y_storage_next = sumi_next; end else if (sm_tdone) begin y_storage_next = INVALID_DATA; end else begin y_storage_next = y_storage; end end // define y_change_valid assign y_change_valid_next = ((y_locked_next == 0) && (y_locked == 1)) ? PULLU : PULLD; // define y_buffer always @(*) begin if (sm_tdone || ap_idle) begin y_buffer_next = INVALID_DATA; y_cnter_next = 0; end else if (op_cnter3 == CNTERA_INVALID && op_cnter2 == 0) begin y_buffer_next = sumi_next; y_cnter_next = y_cnter + 1; end else if (op_cnter3 == 1 && op_cnter2 == 0 && !y_locked) begin y_buffer_next = sumi_next; y_cnter_next = y_cnter + 1; end else if (y_change_valid) begin y_buffer_next = y_storage; y_cnter_next = y_cnter + 1; end else begin y_buffer_next = y_buffer; y_cnter_next = y_cnter; end end //-------------------------- Axi-Stream SM (Output Y) ------------------------------- always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin sm_tvalid_tmp <= 0; sm_tlast_tmp <= 0; y_cnter <= 0; end else begin sm_tvalid_tmp <= sm_tvalid_next; sm_tlast_tmp <= sm_tlast_next; y_cnter <= y_cnter_next; end end // assign to sm_tvalid assign sm_tvalid = sm_tvalid_tmp; // define sm_tvalid always @(*) begin if (op_cnter3 == CNTERA_INVALID && op_cnter2 == 0) begin sm_tvalid_next = PULLU; end else if (op_cnter3 == 1 && op_cnter2 == 0 && !y_locked) begin sm_tvalid_next = PULLU; end else if (y_change_valid && !ap_idle) begin sm_tvalid_next = PULLU; end else if (sm_tready) begin sm_tvalid_next = PULLD; end else begin sm_tvalid_next = sm_tvalid_tmp; end end // define sm_tlast always @(*) begin if (!ap_idle && y_cnter_next == data_length && sm_tvalid_next == PULLU) begin sm_tlast_next = PULLU; end else if (sm_tvalid_next == PULLD) begin sm_tlast_next = PULLD; end else begin sm_tlast_next = sm_tlast_tmp; end end // assign sm_tlast assign sm_tlast = sm_tlast_tmp; // assign to sm_tdata assign sm_tdata = (sm_tready) ? y_buffer_next : INVALID_DATA; endmodule ``` :::danger Timing violation ![image](https://hackmd.io/_uploads/H1iU6vskxl.png) ![image](https://hackmd.io/_uploads/Hy0Ppviyxx.png) The reason may be too many if else statement, causing mutliple stage MUX, which result in bad timing path. As follow: ![image](https://hackmd.io/_uploads/SymMRvo1gl.png) ![image](https://hackmd.io/_uploads/ryRMCwjklx.png) ::: ## Final version Code Analyse ### Axi_Lite ![image](https://hackmd.io/_uploads/Hkca3hT1xg.png) ```verilog //-------------------------- Axi-Lite ------------------------------- // assign response signals assign wready = wready_tmp; assign awready = awready_tmp; assign arready = arready_tmp; assign rvalid = rvalid_tmp; // assign tap_RAM ouput signals assign tap_WE = (ap_idle && awready && wready) ? 4'b1111 : 4'b0000; assign tap_EN = (ap_idle) ? ((wready && awaddr[7]) || (rvalid && araddr_tmp[7])) : ((|data_cnt_state_next) || ss_tdone); // (|data_cnt_state) == !(data_cnt_state == 0) assign tap_Di = wdata; assign tap_A = (ap_idle) ? ((awvalid) ? awaddr[6:0] : araddr_tmp[6:0]) : (tapA_cnter_next << 2); always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin data_length <= 0; tap_length <= 0; wready_tmp <= 0; awready_tmp <= 0; arready_tmp <= 0; rvalid_tmp <= 0; araddr_tmp <= 0; end else begin data_length <= data_length_next; tap_length <= tap_length_next; wready_tmp <= wready_next; awready_tmp <= awready_next; arready_tmp <= arready_next; rvalid_tmp <= rvalid_next; araddr_tmp <= araddr_next; end end always @(*) begin // managing wready and awready if (awvalid && wvalid && !awready) begin wready_next = PULLU; awready_next = PULLU; end else begin wready_next = PULLD; awready_next = PULLD; end // managing arready if (arvalid && !arready) begin arready_next = PULLU; end else begin arready_next = PULLD; end // managing rvalid if (arready) begin rvalid_next = PULLU; end else if (rready) begin rvalid_next = PULLD; end else begin rvalid_next = rvalid_tmp; end // managing araddr_tmp (store the value of araddr until shakehand) if (arvalid) begin araddr_next = araddr; end else if (rvalid && rready) begin araddr_next = INVALID_ADDR; end else begin araddr_next = araddr_tmp; end end // define address of the three register (ap_crtl, data_length, tap_length) into 2 bits // which is used in "condition" always @(*) begin if (araddr_tmp == AP_ADDR) begin addr_define = 2'b11; end else if (araddr_tmp == DATA_ADDR) begin addr_define = 2'b01; end else begin addr_define = 2'b10; end end assign condition = {araddr_tmp[7], ap_idle, rready && rvalid, addr_define}; always @(*) begin casez (condition) 5'b111??: begin rdata_tmp = tap_Do; end 5'b101??: begin rdata_tmp = INVALID_DATA; end 5'b0?101: begin rdata_tmp = data_length; end 5'b0?110: begin rdata_tmp = tap_length; end 5'b0?111: begin rdata_tmp = ap_crtl; end default: begin rdata_tmp = {(pDATA_WIDTH){1'b0}}; end endcase end assign rdata = rdata_tmp; always @(*) begin // define the flipflop of data_length if ((|tap_WE) && (awaddr == DATA_ADDR)) begin data_length_next = wdata; end else begin data_length_next = data_length; end // define the flipflop of tap_length if ((|tap_WE) && (awaddr == TAP_ADDR)) begin tap_length_next = wdata; end else begin tap_length_next = tap_length; end end ``` ### Ap_ctrl ![image](https://hackmd.io/_uploads/SywaT26kxe.png) ```verilog //-------------------------- ap_idle & ap_done & ap_start ------------------------------- assign ap_crtl = {ap_idle, ap_done, ap_start}; // define the write in of ap_start assign ap_start = ((|tap_WE) && (awaddr == AP_ADDR)) ? wdata[0] : PULLD; always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin ap_idle <= 1; ap_done <= 0; end else begin ap_idle <= ap_idle_next; ap_done <= ap_done_next; end end always @(*) begin // define the flipflop of ap_idle if (ap_start) begin ap_idle_next = PULLD; end else if (sm_tlast && sm_tready) begin ap_idle_next = PULLU; end else begin ap_idle_next = ap_idle; end // define the flipflop of ap_done if (sm_tlast && sm_tready) begin ap_done_next = PULLU; end else if (ap_idle && ap_done && (araddr_tmp == AP_ADDR) && rready && rvalid) begin ap_done_next = PULLD; end else begin ap_done_next = ap_done; end end ``` ### Axi_Stream SS (input) ![image](https://hackmd.io/_uploads/SJft1TT1gx.png) ![image](https://hackmd.io/_uploads/Hy5YlTp1xx.png) We can see that input x has already input the 31th x, but our engine only done 6 stage of calculation. ```verilog //-------------------------- Axi-Stream SS (input X) ------------------------------- always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin data_cnt_now <= 0; x_buffer <= INVALID_DATA; data_filled <= 0; ss_tdone <= 0; ss_tready_tmp <= 0; end else begin data_cnt_now <= data_cnt_now_next; x_buffer <= x_buffer_next; data_filled <= data_filled_next; ss_tdone <= ss_tdone_next; ss_tready_tmp <= ss_tready_next; end end assign ss_tready = ss_tready_tmp; // used as a switch to turn off data_EN and tap_EN assign ss_tdone_next = (ss_tready && ss_tlast) ? PULLU : (sm_tready && sm_tlast) ? PULLD : ss_tdone; // data_filled = 1 if data_RAM is filled assign data_filled_next = (sm_tdone) ? PULLD : (data_cnt_now_next == tap_length) && !ap_idle ? PULLU : data_filled; // define data_cnt_now_next and x_buffer_next always @(*) begin if (ap_idle) begin data_cnt_now_next = {(pDATA_WIDTH-1){1'b0}}; x_buffer_next = INVALID_DATA; end else if (ss_tvalid && ss_tready) begin data_cnt_now_next = data_cnt_now + 1; x_buffer_next = ss_tdata; end else if (sm_tdone) begin // ss_tdone data_cnt_now_next = {(pDATA_WIDTH-1){1'b0}}; x_buffer_next = INVALID_DATA; end else begin data_cnt_now_next = data_cnt_now; x_buffer_next = x_buffer; end end // define ss_tready_next always @(*) begin if ((!data_filled || ((data_cnt_now_next == data_cnt_state_next) && !ss_tdone)) && ss_tvalid && !ss_tready && !ap_idle) begin ss_tready_next = PULLU; end else begin ss_tready_next = PULLD; end end ``` ### Fir engine ![image](https://hackmd.io/_uploads/Skfdm6TJxx.png) ```verilog //-------------------------- FIR engine ------------------------------- always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin data_cnt_state <= 0; data_cnt_norm <= 0; tapA_cnter <= CNTERA_INVALID; dataA_cnter <= CNTERA_INVALID; cnter_indata <= 0; write_sr <= 0; op_cnter1 <= CNTERA_INVALID; op_cnter2 <= CNTERA_INVALID; op_cnter3 <= CNTERA_INVALID; mul_ctrl1 <= 0; mul_ctrl2 <= 0; xi <= INVALID_DATA; tapi <= INVALID_DATA; sum_ctrl <= 0; sumi <= 0; muli <= 0; y_locked <= 0; y_storage <= INVALID_DATA; y_change_valid <= 0; y_buffer <= INVALID_DATA; sm_tdone <= 0; end else begin data_cnt_state <= data_cnt_state_next; data_cnt_norm <= data_cnt_norm_next; tapA_cnter <= tapA_cnter_next; dataA_cnter <= dataA_cnter_next; cnter_indata <= cnter_indata_next; write_sr <= write_sr_next; op_cnter1 <= op_cnter1_next; op_cnter2 <= op_cnter2_next; op_cnter3 <= op_cnter3_next; mul_ctrl1 <= mul_ctrl1_next; mul_ctrl2 <= mul_ctrl2_next; xi <= xi_next; tapi <= tapi_next; sum_ctrl <= sum_ctrl_next; sumi <= sumi_next; muli <= muli_next; y_locked <= y_locked_next; y_storage <= y_storage_next; y_change_valid <= y_change_valid_next; y_buffer <= y_buffer_next; sm_tdone <= sm_tdone_next; end end // define data_cnt_state_next always @(*) begin if (ap_idle) begin // oridinally no ap_idle data_cnt_state_next = 0; end else if ((data_cnt_now_next == 1) && (data_cnt_now == 0)) begin data_cnt_state_next = 1; end else if (tapA_cnter == 1 && !(y_locked || y_locked_next) && (data_cnt_state < data_cnt_now_next)) begin data_cnt_state_next = data_cnt_state + 1; end else begin data_cnt_state_next = data_cnt_state; end end // define data_cnt_norm_next, which is the loop of 1 to tap_lentgh - 1 always @(*) begin if (ap_idle) begin // sm_tdone data_cnt_norm_next = 0; end else if (!(data_cnt_state == data_cnt_state_next) && (data_cnt_norm == tap_length)) begin data_cnt_norm_next = 1; end else if (!(data_cnt_state == data_cnt_state_next)) begin data_cnt_norm_next = data_cnt_norm + 1; end else begin data_cnt_norm_next = data_cnt_norm; end end // define tapA_cnter_next always @(*) begin if (ap_idle) begin // sm_tdone tapA_cnter_next = CNTERA_INVALID; end else if (!(data_cnt_state == data_cnt_state_next)) begin tapA_cnter_next = 0; end else if (!data_write && (tapA_cnter == 0) && (data_cnt_state_next < tap_length)) begin tapA_cnter_next = data_cnt_state_next; end else if (!data_write && (tapA_cnter == 0)) begin tapA_cnter_next = tap_length - 1; end else if (tapA_cnter > 1 && !data_write && !(tapA_cnter == CNTERA_INVALID)) begin //only tapA_cnter > 1 tapA_cnter_next = tapA_cnter - 1; end else begin tapA_cnter_next = tapA_cnter; end end // define cnter_indata always @(*) begin if (ap_idle || (!(data_cnt_state == data_cnt_state_next) && (cnter_indata == tap_length - 1))) begin cnter_indata_next = 0; end else if (!(data_cnt_state == data_cnt_state_next) && data_cnt_state_next >= tap_length) begin cnter_indata_next = cnter_indata + 1; end else begin cnter_indata_next = cnter_indata; end end // define dataA_cnter_next always @(*) begin if (ap_idle) begin // ss_tdone dataA_cnter_next = CNTERA_INVALID; end else if (!(data_cnt_state == data_cnt_state_next)) begin dataA_cnter_next = data_cnt_norm_next - 1; end else if (!data_write && ((tapA_cnter == 0 && cnter_indata == 0) || (!(tapA_cnter == 0) && !(tapA_cnter_next == tapA_cnter) && (dataA_cnter + 1 == tap_length)))) begin dataA_cnter_next = 0; end else if (tapA_cnter == 0 && !data_write) begin dataA_cnter_next = cnter_indata_next; end else if (!(tapA_cnter_next == tapA_cnter) && !data_write) begin dataA_cnter_next = dataA_cnter + 1; end else begin dataA_cnter_next = dataA_cnter; end end // define write_sr_next always @(*) begin if (!(data_cnt_state == data_cnt_state_next) && !(data_cnt_state_next == data_cnt_now_next) && ss_tready) begin write_sr_next = PULLU; end else begin write_sr_next = PULLD; end end // define data_write always @(*) begin if ((!data_filled && write_sr) || ((data_cnt_state_next > tap_length) && !(data_cnt_state == data_cnt_state_next))) begin data_write = PULLU; end else if (!data_filled && !(write_sr_next == PULLU)) begin data_write = ss_tready; end else begin data_write = PULLD; end end ``` * `data_cnt_now_next`: represent the nth input. * `data_cnt_state_next`: represent the nth stage of calculation, the first tap and data cnter in the stage is the final thing to add for the n-1th stage. Meaning that our stage will pre-calculate the summation for the input x (nth) of the rest of the stage in (n-1th). * `tapA_cnter_next` and `dataA_cnter_next`: the cnter for address input of `tap_A` and `data_A` * `data_cnt_norm_next`: normalized state, which will be a loop of `1 ~ tap_length`, for the assignment of cnter address. * `cnter_indata`: is the cnter in `dataA_cnter_next`, since `dataA_cnter_next` is not a simple cnter, it is composed of this `cnter_indata` and other condition. * `write_sr_next`: Since the first cnter address of the stage is to do the final summation of the previous stage, and we use the change of the `data_cnt_state_next` to determine the change of the cnter. Therefore, we have to ensure that the `data_write` only write in the first cnter address if it is to write the input x of this stage. * `data_write`: To write the values in `x_buffer` into SRAM. ![image](https://hackmd.io/_uploads/r1z7cp6klx.png) ```verilog // assign data_RAM ouput signals assign data_WE = {(4){data_write}}; assign data_EN = (|data_cnt_state_next) || ss_tdone; assign data_Di = x_buffer_next; assign data_A_tmp = (data_write && !data_filled) ? (data_cnt_now_next - 1) << 2 : dataA_cnter_next << 2; assign data_A = data_A_tmp[(pADDR_WIDTH-1):0]; // bcs data_cnt_now_next is [(pDATA_WIDTH-1):0] // assign xi and tapi for pipelining assign xi_next = data_Do; assign tapi_next = tap_Do; // assign op_cnter_next assign op_cnter1_next = tapA_cnter; assign op_cnter2_next = op_cnter1; assign op_cnter3_next = op_cnter2; // define mul_ctrl assign mul_ctrl1_next = !(tapA_cnter_next == tapA_cnter) ? PULLU : PULLD; assign mul_ctrl2_next = mul_ctrl1; // define sum_ctrl assign sum_ctrl_next = mul_ctrl2; // manage muli: do multiply operation always @(*) begin if (mul_ctrl2) begin muli_next = xi * tapi; end else begin muli_next = muli; end end // manage sumi: do sum operation always @(*) begin if ((sum_ctrl && (op_cnter3 == CNTERA_INVALID)) || (sum_ctrl && (op_cnter3 == 0))) begin sumi_next = muli; end else if (sum_ctrl) begin sumi_next = muli + sumi; end else begin sumi_next = sumi; end end // define y_locked always @(*) begin if (op_cnter2 == 1 && op_cnter1 == 0 && sm_tvalid && !sm_tready) begin y_locked_next = PULLU; end else if (sm_tready) begin y_locked_next = PULLD; end else begin y_locked_next = y_locked; end end // assign sm_tdone assign sm_tdone_next = sm_tlast && sm_tready && ~sm_tdone; // define y_storage always @(*) begin if (op_cnter3 == 1 && op_cnter2 == 0 && y_locked) begin y_storage_next = sumi_next; end else if (sm_tdone) begin y_storage_next = INVALID_DATA; end else begin y_storage_next = y_storage; end end // define y_change_valid assign y_change_valid_next = ((y_locked_next == 0) && (y_locked == 1)) ? PULLU : PULLD; // define y_buffer always @(*) begin if (sm_tdone || ap_idle) begin y_buffer_next = INVALID_DATA; y_cnter_next = 0; end else if ((op_cnter3 == CNTERA_INVALID && op_cnter2 == 0) || (op_cnter3 == 1 && op_cnter2 == 0 && !y_locked)) begin y_buffer_next = sumi_next; y_cnter_next = y_cnter + 1; end else if (y_change_valid) begin y_buffer_next = y_storage; y_cnter_next = y_cnter + 1; end else begin y_buffer_next = y_buffer; y_cnter_next = y_cnter; end end ``` * `mul_ctrl2` and `sum_ctrl`: is when the mul and sum do operation, these two are determined by the change of `tapA_cnter`. * `y_locked_next`: When y is ready to output but the CPU did not response, `y_locked` will be pull up, and stall the `data_cnt_state_next`. * `y_storage_next`: Is to storage y that should be sent but CPU have not accept yet, and let the engine do the pre-calculation. * `y_change_valid`: It is use to determine whether `y_buffer_next` should be the value of final summation or the value of y_storage. ### Axi-Stream SM (output) ![image](https://hackmd.io/_uploads/Hy63n6pJxl.png) ```verilog //-------------------------- Axi-Stream SM (Output Y) ------------------------------- always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin sm_tvalid_tmp <= 0; sm_tlast_tmp <= 0; y_cnter <= 0; end else begin sm_tvalid_tmp <= sm_tvalid_next; sm_tlast_tmp <= sm_tlast_next; y_cnter <= y_cnter_next; end end // assign to sm_tvalid assign sm_tvalid = sm_tvalid_tmp; // define sm_tvalid always @(*) begin if ((op_cnter3 == CNTERA_INVALID && op_cnter2 == 0) || (op_cnter3 == 1 && op_cnter2 == 0 && !y_locked) || (y_change_valid && !ap_idle)) begin sm_tvalid_next = PULLU; end else if (sm_tready) begin sm_tvalid_next = PULLD; end else begin sm_tvalid_next = sm_tvalid_tmp; end end // define sm_tlast always @(*) begin if (!ap_idle && y_cnter_next == data_length && sm_tvalid_next == PULLU) begin sm_tlast_next = PULLU; end else if (sm_tvalid_next == PULLD) begin sm_tlast_next = PULLD; end else begin sm_tlast_next = sm_tlast_tmp; end end // assign sm_tlast assign sm_tlast = sm_tlast_tmp; // assign to sm_tdata assign sm_tdata = (sm_tready) ? y_buffer_next : INVALID_DATA; ``` ## Waveform overview ![image](https://hackmd.io/_uploads/rkFmTa6yee.png) ![image](https://hackmd.io/_uploads/Bkkq6p6Jex.png) ## Syn Report ### Time ![image](https://hackmd.io/_uploads/ryR106pJll.png) ### Area ![image](https://hackmd.io/_uploads/r1bW0paJgg.png)