# Lab_3 FIR [Reference: lab_3 design](https://github.com/Raywang908/lab_3) ## Design SPEC * data_width = max 32 bit * tap_width = max 32 bit * addr_width = max 12 bit * tap_number = max 32 unit * RAM_addr_width = max 12 bit * RAM_capacity = data_width * 32 space * addr 0x00: for [2] ap_idle, [1] ap_done, [0] ap_start * addr 0x10: data_length * addr 0x14: tap_length * addr 0x80 ~ 0xFF: tap, data ## Introduction [Reference: lab_3 workbook](https://drive.google.com/file/d/1PSqW4qURLvSZxRDxA4NAYPYXn8ixE6ib/view?usp=drive_link) main operation of this system is convolution : y[t] = ∑(h[i] ⋅ x[t−i]) Other than that, we consider AXI-Lite and AXI-Stream in this lab. We use Verilog to simulate the protocol in order to become familiar with the AXI bus. ![f9e7921e-87e7-4899-aaf4-30fee567117f](https://hackmd.io/_uploads/Hkc5Ndznkl.jpg) The image above is a brief concept (not all correct) about how this protocal goes. **Testbench** represents **master** and **Fir** represents **slave**, Testbench send coefficient by AXI_Lite because AXI_Lite can carry address too, meanwhile, AXI_Stream can only send data, so Testbench send data x and receive data y by AXI_Stream. ## Design Approach The write and read protocal is as follow: ![image](https://hackmd.io/_uploads/HJ0nDqM31l.png) ![image](https://hackmd.io/_uploads/rJxCv9G3kl.png) 1. Write Tap and Coefficient Only when testbench send **awvalid** `0 -> 1`, **wvalid** `0 -> 1`, should fir respone with wready `0 -> 1`,and let data be written into the tap_RAM. The cases that **wvalid** rise to 1 first or **awvalid** rise to 1 first are considered. ![image](https://hackmd.io/_uploads/r104L9f21e.png) ![image](https://hackmd.io/_uploads/r1Kz89f31g.png) 2. Read Tap and Coefficient When **arvalid** `0 -> 1`, fir should respone **arready** `0 -> 1`, after that **araddr** should be stored in a register, since **rready** and **rvalid** may not shakehand instantly. From the read protocal picture, **rvalid** should rise to 1 once the **arvalid** and **arready** shakehand, but **rready** can rist to 1 at any time. This situation should be considered. ![image](https://hackmd.io/_uploads/BJuSY5G31x.png) ![image](https://hackmd.io/_uploads/BJWFY5G3ye.png) 3. FSM of fir (ap_ctrl) When finish writing tap into tap_RAM, testbench can let **ap_start** `0 -> 1`, and **ap_idle** would `1 -> 0`, meaning that fir engine starting process data x. After **sm_last** is send to testbench telling that the last data y is outputed, **ap_done** and **ap_idle** rises to 1, and once **ap_done** is read, it will be pull down to 1. ![image](https://hackmd.io/_uploads/r1O2JiG21e.png) 4. Convolution Considering data_RAM can only store maximum 32 data, I break the situation to two parts, for the 1st part, the data_RAM is not filled with number tap_length, the convolution do from addr 0x80 to the addr which is writing into. For the 2nd part, **data_cnt** will decide which addr will the next data x write in. ![image](https://hackmd.io/_uploads/SyZ9DA7n1e.png) ![image](https://hackmd.io/_uploads/SydDYeE3kl.png) ## Code Analyse ### all code ```verilog module fir #( parameter pADDR_WIDTH = 12, parameter pDATA_WIDTH = 32, parameter Tape_Num = 12 ) ( output wire awready, output wire wready, input wire awvalid, input wire [(pADDR_WIDTH-1):0] awaddr, input wire wvalid, input wire [(pDATA_WIDTH-1):0] wdata, output wire arready, input wire rready, input wire arvalid, input wire [(pADDR_WIDTH-1):0] araddr, output wire rvalid, output wire [(pDATA_WIDTH-1):0] rdata, input wire ss_tvalid, input wire [(pDATA_WIDTH-1):0] ss_tdata, input wire ss_tlast, output wire ss_tready, input wire sm_tready, output wire sm_tvalid, output wire [(pDATA_WIDTH-1):0] sm_tdata, output wire sm_tlast, // bram for tap RAM output wire [3:0] tap_WE, output wire tap_EN, output wire [(pDATA_WIDTH-1):0] tap_Di, output wire [(pADDR_WIDTH-1):0] tap_A, input wire [(pDATA_WIDTH-1):0] tap_Do, // bram for data RAM output wire [3:0] data_WE, output wire data_EN, output wire [(pDATA_WIDTH-1):0] data_Di, output wire [(pADDR_WIDTH-1):0] data_A, input wire [(pDATA_WIDTH-1):0] data_Do, input wire axis_clk, input wire axis_rst_n ); //========================== Declaration ========================== //-------------------------- Axi-Lite ------------------------------- reg rvalid_next; localparam WRITE = 2'b00; localparam READ = 2'b01; localparam IDLE = 2'b10; localparam NULL_ADDR = 12'h90; reg [1:0] axi_state; reg [1:0] axi_state_next; //1.first read -> wait write 2. read/write same time -> if not wait/waiting -> write/read 3. reg axi_state_finish; reg awready_tmp; reg wready_tmp; reg arready_tmp; reg rvalid_tmp; reg [(pADDR_WIDTH-1):0] awaddr_tmp; reg [(pADDR_WIDTH-1):0] araddr_tmp; reg [(pADDR_WIDTH-1):0] araddr_next; reg [(pDATA_WIDTH-1):0] rdata_tmp; reg tap_EN_tmp; reg [3:0] tap_WE_tmp; reg [(pADDR_WIDTH-1):0] tap_A_tmp; reg [(pDATA_WIDTH-1):0] tap_Di_tmp; localparam ADDR_MASK = 32'd127; reg write_alr; reg write_tap_next; reg write_tap; reg read_tap; reg read_tap_next; reg [(pDATA_WIDTH-1):0] data_length; reg [(pDATA_WIDTH-1):0] data_length_next; reg [5:0] tap_length; //[(pDATA_WIDTH-1):0] reg [5:0] tap_length_next; wire [(pDATA_WIDTH-1):0] tap_length_large; reg axi_lite_on; reg tap_EN_all; reg [3:0] tap_WE_all; reg [(pADDR_WIDTH-1):0] tap_A_all; reg [(pDATA_WIDTH-1):0] tap_Di_all; //-------------------------- ap_idle & ap_done & ap_start ------------------------------- reg [2:0] ap_ctrl; //[2] -> ap_idle,[1] -> ap_done,[0] -> ap_start reg ap_idle_next; reg ap_done_next; reg done_read_next; reg done_read; reg ap_start_next; //-------------------------- Axi-Stream SS (input X) ------------------------------- reg [1:0] ss_state; reg [1:0] ss_state_next; localparam IDLE_SS = 2'b10; localparam READ_SS = 2'b01; localparam WRITE_SS = 2'b00; reg [(pADDR_WIDTH-1):0] data_A_tmp; reg [(pDATA_WIDTH-1):0] data_Di_tmp; reg data_EN_tmp; reg [3:0] data_WE_tmp; reg ss_tready_tmp; reg ss_tready_next; reg [(pDATA_WIDTH-1):0] data_cnt; reg [(pDATA_WIDTH-1):0] data_cnt_next; reg read_should; reg read_should_next; reg ss_on; reg tap_EN_ss; reg [3:0] tap_WE_ss; reg [(pADDR_WIDTH-1):0] tap_A_ss; reg [(pDATA_WIDTH-1):0] tap_Di_ss; reg [5:0] operation; reg [5:0] current_next; reg [5:0] current; reg wait_sm; reg wait_sm_next; reg [1:0] toread_y_cnt; reg [1:0] toread_y_cnt_next; reg [(pDATA_WIDTH-1):0] stop_early_next; reg [(pDATA_WIDTH-1):0] stop_early; reg lock_off; reg lock_off_next; //-------------------------- Axi-Stream SM (input Y) ------------------------------- reg sm_tvalid_tmp; reg sm_tvalid_next; reg send_should_next; reg send_should; reg [(pDATA_WIDTH-1):0] sm_tdata_tmp; reg [(pDATA_WIDTH-1):0] sm_tdata_next; reg send_waiting; reg send_waiting_next; //-------------------------- FIR convolution ------------------------------- reg [(pDATA_WIDTH-1):0] data_in_conv; reg [(pDATA_WIDTH-1):0] tap_in_conv; reg [(pDATA_WIDTH-1):0] data_in_conv_next; reg [(pDATA_WIDTH-1):0] tap_in_conv_next; reg [(pDATA_WIDTH-1):0] h; reg [(pDATA_WIDTH-1):0] x; reg [(pDATA_WIDTH-1):0] y; wire [(pDATA_WIDTH-1):0] y_tmp; reg [(pDATA_WIDTH-1):0] y_next; reg [(pDATA_WIDTH-1):0] m; wire [(pDATA_WIDTH-1):0] m_tmp; reg [(pDATA_WIDTH-1):0] output_final; //-------------------------- Addr generator ------------------------------- reg [(pADDR_WIDTH-1):0] addr_genr_next; reg [(pADDR_WIDTH-1):0] addr_genr; reg [(pADDR_WIDTH-1):0] tap_genr_next; reg [(pADDR_WIDTH-1):0] tap_genr; //========================== Function ========================== //-------------------------- Axi-Lite ------------------------------- always @(*) begin if (arvalid && ((araddr == 12'h10) || (araddr == 12'h14) || (araddr == 12'd0))) begin //QS: arvalid can be take out araddr_next = araddr; read_tap_next = 0; end else if (arvalid && araddr[7]) begin araddr_next = araddr & ADDR_MASK; read_tap_next = 1; end else if (arvalid) begin araddr_next = NULL_ADDR; read_tap_next = 1; end else begin araddr_next = araddr_tmp; read_tap_next = read_tap; end end always @(*) begin if ((awaddr == 12'h10) || (awaddr == 12'h14) || ((awaddr == 12'd0) && ap_ctrl[2] && (axi_state_next == WRITE))) begin awaddr_tmp = awaddr; write_tap_next = 0; end else if (awaddr[7] == 1) begin awaddr_tmp = awaddr & ADDR_MASK; write_tap_next = 1; end else begin awaddr_tmp = NULL_ADDR; write_tap_next = 1; //QS end end always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin axi_state <= IDLE; araddr_tmp <= NULL_ADDR; rvalid_tmp <= 0; write_tap <= 1; read_tap <= 1; done_read <= 0; data_length <= 0; tap_length <= 0; end else begin axi_state <= axi_state_next; araddr_tmp <= araddr_next; rvalid_tmp <= rvalid_next; write_tap <= write_tap_next; read_tap <= read_tap_next; done_read <= done_read_next; data_length <= data_length_next; tap_length <= tap_length_next; end end assign rvalid = rvalid_tmp; assign arready = arready_tmp; always @(*) begin if (arvalid && awvalid && wvalid && axi_state_finish) begin if (write_alr) begin axi_state_next = READ; end else begin axi_state_next = WRITE; end end else if (awvalid && wvalid && axi_state_finish) begin if (write_alr) begin axi_state_next = IDLE; end else begin axi_state_next = WRITE; end end else if (arvalid && axi_state_finish) begin axi_state_next = READ; end else if (!axi_state_finish) begin axi_state_next = axi_state; end else begin axi_state_next = IDLE; end end always @(*) begin case (axi_state) WRITE: begin axi_lite_on = 1; rdata_tmp = 32'd0; rvalid_next = 0; arready_tmp = 0; done_read_next = 0; //done_read_next = done_read; if (!ap_ctrl[2] && write_tap) begin wready_tmp = 1; awready_tmp = 1; tap_EN_tmp = 0; tap_WE_tmp = 4'b1111; tap_A_tmp = awaddr_tmp; tap_Di_tmp = wdata; axi_state_finish = 1; write_alr = 1; data_length_next = data_length; tap_length_next = tap_length; ap_start_next = 0; //ap_ctrl[0] end else if (write_tap == 1) begin wready_tmp = 1; awready_tmp = 1; tap_EN_tmp = 1; tap_WE_tmp = 4'b1111; tap_A_tmp = awaddr_tmp; tap_Di_tmp = wdata; axi_state_finish = 1; write_alr = 1; data_length_next = data_length; tap_length_next = tap_length; ap_start_next = 0; //ap_ctrl[0] end else begin tap_EN_tmp = 0; tap_WE_tmp = 0; tap_A_tmp = NULL_ADDR; tap_Di_tmp = 0; if (awaddr_tmp == 12'h10) begin wready_tmp = 1; awready_tmp = 1; if (!ap_ctrl[2]) begin data_length_next = data_length; end else begin data_length_next = wdata; //ERROR end tap_length_next = tap_length; axi_state_finish = 1; write_alr = 1; ap_start_next = 0; //ap_ctrl[0] end else if (awaddr_tmp == 12'h14) begin wready_tmp = 1; awready_tmp = 1; data_length_next = data_length; if (!ap_ctrl[2]) begin tap_length_next = tap_length; end else begin tap_length_next = wdata; //ERROR end axi_state_finish = 1; write_alr = 1; ap_start_next = 0; //ap_ctrl[0] end else begin wready_tmp = 1; awready_tmp = 1; data_length_next = data_length; tap_length_next = tap_length; //ap_start_next = 1; //ERROR wdata & (3'b011) if (!ap_ctrl[2]) begin ap_start_next = 0; end else begin ap_start_next = 1; end axi_state_finish = 1; write_alr = 1; end end end READ: begin axi_lite_on = 1; write_alr = 0; wready_tmp = 0; awready_tmp = 0; //arready_tmp = 1; data_length_next = data_length; tap_length_next = tap_length; ap_start_next = 0; //ap_ctrl[0] if ((!ap_ctrl[2] || (araddr_tmp == NULL_ADDR)) && read_tap) begin done_read_next = 0; //done_read_next = done_read; if (arvalid) begin //arready && arvalid arready_tmp = 1; rdata_tmp = 32'd0; rvalid_next = 1; tap_EN_tmp = 0; tap_WE_tmp = 0; tap_A_tmp = araddr_tmp; tap_Di_tmp = wdata; axi_state_finish = 0; end else if (rready && rvalid) begin arready_tmp = 0; rdata_tmp = 32'hffffffff; rvalid_next = 0; tap_EN_tmp = 0; tap_WE_tmp = 0; tap_A_tmp = araddr_tmp; tap_Di_tmp = wdata; axi_state_finish = 1; end else begin arready_tmp = 0; rdata_tmp = 32'd0; rvalid_next = rvalid_tmp; tap_EN_tmp = 0; tap_WE_tmp = 0; tap_A_tmp = araddr_tmp; tap_Di_tmp = wdata; axi_state_finish = 0; end end else if (!read_tap) begin tap_EN_tmp = 0; tap_WE_tmp = 0; tap_A_tmp = NULL_ADDR; tap_Di_tmp = 0; if (arvalid) begin //arready && arvalid arready_tmp = 1; rdata_tmp = 32'd0; rvalid_next = 1; axi_state_finish = 0; done_read_next = 0; end else if (rready && rvalid) begin arready_tmp = 0; rvalid_next = 0; if (araddr_tmp == 12'h10) begin rdata_tmp = data_length; end else if (araddr_tmp == 12'h14) begin rdata_tmp = tap_length; end else begin rdata_tmp = ap_ctrl; end axi_state_finish = 1; if (ap_ctrl[2] && ap_ctrl[1]) begin done_read_next = 1; end else begin done_read_next = 0; end end else begin arready_tmp = 0; rdata_tmp = 32'd0; rvalid_next = rvalid_tmp; axi_state_finish = 0; done_read_next = 0; //done_read_next = done_read; end end else begin done_read_next = 0; //done_read_next = done_read; if (arvalid) begin //arready && arvalid arready_tmp = 1; rdata_tmp = 32'd0; rvalid_next = 1; tap_EN_tmp = 0; tap_WE_tmp = 0; tap_A_tmp = araddr_tmp; tap_Di_tmp = wdata; axi_state_finish = 0; end else if (rready && rvalid) begin arready_tmp = 0; rvalid_next = 0; tap_EN_tmp = 1; tap_WE_tmp = 0; tap_A_tmp = araddr_tmp; tap_Di_tmp = wdata; rdata_tmp = tap_Do; axi_state_finish = 1; end else begin arready_tmp = 0; rdata_tmp = 32'd0; rvalid_next = rvalid_tmp; tap_EN_tmp = 0; tap_WE_tmp = 0; tap_A_tmp = araddr_tmp; tap_Di_tmp = wdata; axi_state_finish = 0; end end end IDLE: begin axi_lite_on = 0; write_alr = 0; rvalid_next = 0; rdata_tmp = 32'd0; tap_EN_tmp = 0; tap_WE_tmp = 0; tap_A_tmp = NULL_ADDR; tap_Di_tmp = 0; arready_tmp = 0; wready_tmp = 0; awready_tmp = 0; axi_state_finish = 1; data_length_next = data_length; tap_length_next = tap_length; ap_start_next = 0; //ap_ctrl[0] done_read_next = 0; //done_read_next = done_read; end default: begin axi_lite_on = 0; write_alr = 0; rvalid_next = 0; rdata_tmp = 32'd0; tap_EN_tmp = 0; tap_WE_tmp = 0; tap_A_tmp = NULL_ADDR; tap_Di_tmp = 0; arready_tmp = 0; wready_tmp = 0; awready_tmp = 0; axi_state_finish = 1; data_length_next = data_length; tap_length_next = tap_length; ap_start_next = 0; //ap_ctrl[0] done_read_next = 0; //done_read_next = done_read; end endcase end assign rdata = rdata_tmp; assign awready = awready_tmp; assign wready = wready_tmp; always @(*) begin if (axi_lite_on && !ss_on) begin tap_A_all = tap_A_tmp; tap_Di_all = tap_Di_tmp; tap_EN_all = tap_EN_tmp; tap_WE_all = tap_WE_tmp; end else begin tap_A_all = tap_A_ss; tap_Di_all = tap_Di_ss; tap_EN_all = tap_EN_ss; tap_WE_all = tap_WE_ss; end end assign tap_A = tap_A_all; assign tap_Di = tap_Di_all; assign tap_EN = tap_EN_all; assign tap_WE = tap_WE_all; //-------------------------- ap_idle & ap_done & ap_start ------------------------------- always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin ap_ctrl[2] <= 1; ap_ctrl[1] <= 0; ap_ctrl[0] <= 0; end else begin ap_ctrl[2] <= ap_idle_next; ap_ctrl[1] <= ap_done_next; ap_ctrl[0] <= ap_start_next; end end always @(*) begin if (ap_ctrl[0] == 1) begin ap_idle_next = 0; end else if (sm_tlast == 1) begin ap_idle_next = 1; end else begin ap_idle_next = ap_ctrl[2]; end if (sm_tlast == 1 && !ap_ctrl[2]) begin ap_done_next = 1; end else if (ap_ctrl[2] && done_read) begin ap_done_next = 0; end else begin ap_done_next = ap_ctrl[1]; end end //-------------------------- Axi-Stream SS (input X) ------------------------------- always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin data_cnt <= 0; ss_state <= IDLE_SS; ss_tready_tmp <= 0; read_should <= 0; current <= 0; wait_sm <= 0; stop_early <= (1 << pDATA_WIDTH) - 1; lock_off <= 1; end else begin data_cnt <= data_cnt_next; ss_state <= ss_state_next; ss_tready_tmp <= ss_tready_next; read_should <= read_should_next; current <= current_next; wait_sm <= wait_sm_next; stop_early <= stop_early_next; lock_off <= lock_off_next; end end assign tap_length_large = tap_length; always @(*) begin if (data_cnt < tap_length_large) begin operation = data_cnt; end else begin operation = tap_length - 1; end end always @(*) begin if (ss_tlast && lock_off) begin stop_early_next = data_cnt + 1; lock_off_next = 0; end else if (ap_ctrl[2] && done_read == 1) begin stop_early_next = (1 << pDATA_WIDTH) - 1; //QS: lock_off_next = 1; end else begin stop_early_next = stop_early; lock_off_next = lock_off; end end //reg [4:0] debug_ss; always @(*) begin if ((data_cnt <= data_length) && (data_cnt <= stop_early) && !ap_ctrl[2]) begin // wait_sm to let y_tmp get //&& !ss_tlast ss_on = 1; //debug_ss = 1; if (ss_tvalid && data_cnt == 0 && ss_tdata != 0) begin ss_state_next = WRITE_SS; data_cnt_next = data_cnt + 1; ss_tready_next = 1; current_next = 0; addr_genr_next = 0; tap_genr_next = 0; //(tap_length - 1) << 2 wait_sm_next = 1; //debug_ss = 0; end else if (ss_tvalid && (current == operation) && !send_waiting_next) begin //!send_waiting && ss_tdata != 0 ss_state_next = WRITE_SS; data_cnt_next = data_cnt + 1; ss_tready_next = 1; current_next = 0; wait_sm_next = 1; //debug_ss = 1; if (addr_genr == ((tap_length - 1) << 2)) begin addr_genr_next = 0; end else begin addr_genr_next = addr_genr + 3'd4; end if (tap_genr == 0) begin tap_genr_next = ((tap_length - 1) << 2); end else begin tap_genr_next = tap_genr - 3'd4; end end else if ((current < operation) && current == 0 && !wait_sm && !send_waiting_next) begin ss_state_next = READ_SS; data_cnt_next = data_cnt; ss_tready_next = 0; current_next = current + 1; wait_sm_next = 0; //debug_ss = 2; if (data_cnt < tap_length_large) begin addr_genr_next = current; tap_genr_next = (operation * 3'd4); //((tap_length - 1) << 2) - (operation * 3'd4) end else begin addr_genr_next = ((data_cnt - tap_length + 1) % tap_length) * 3'd4; tap_genr_next = ((tap_length - 1) << 2); //0 end end else if (current < operation && !wait_sm && !send_waiting_next) begin ss_state_next = READ_SS; data_cnt_next = data_cnt; ss_tready_next = 0; current_next = current + 1; wait_sm_next = 0; //debug_ss = 3; if (addr_genr == ((tap_length - 1) << 2)) begin addr_genr_next = 0; end else begin addr_genr_next = addr_genr + 3'd4; end if (tap_genr == 0) begin tap_genr_next = ((tap_length - 1) << 2); end else begin tap_genr_next = tap_genr - 3'd4; end end else begin ss_state_next = IDLE_SS; data_cnt_next = data_cnt; ss_tready_next = 0; current_next = current; addr_genr_next = addr_genr; tap_genr_next = tap_genr; wait_sm_next = 0; //debug_ss = 4; end end else begin ss_state_next = IDLE_SS; if (done_read == 1) begin data_cnt_next = 0; //QS: end else begin data_cnt_next = data_cnt; end ss_tready_next = 0; current_next = 0; ss_on = 0; addr_genr_next = NULL_ADDR; tap_genr_next = NULL_ADDR; wait_sm_next = 0; //debug_ss = 5; end end assign ss_tready = ss_tready_tmp; always @(*) begin case (ss_state) WRITE_SS: begin data_A_tmp = addr_genr; data_Di_tmp = ss_tdata; data_EN_tmp = 1; data_WE_tmp = 4'b1111; data_in_conv = data_Do; tap_A_ss = tap_genr; tap_Di_ss = 0; tap_WE_ss = 0; tap_EN_ss = 0; tap_in_conv = tap_Do; read_should_next = 1; if (read_should) begin tap_EN_ss = 1; end else begin tap_EN_ss = 0; end end READ: begin data_A_tmp = addr_genr; data_Di_tmp = 0; data_WE_tmp = 0; data_in_conv = data_Do; tap_A_ss = tap_genr; tap_Di_ss = 0; tap_WE_ss = 0; tap_in_conv = tap_Do; if (read_should) begin tap_EN_ss = 1; data_EN_tmp = 1; end else begin tap_EN_ss = 0; data_EN_tmp = 0; end read_should_next = 1; end IDLE_SS: begin data_A_tmp = NULL_ADDR; data_Di_tmp = 0; data_WE_tmp = 0; data_in_conv = data_Do; tap_A_ss = NULL_ADDR; tap_Di_ss = 0; tap_WE_ss = 0; tap_in_conv = tap_Do; if (read_should) begin tap_EN_ss = 1; data_EN_tmp = 1; end else begin tap_EN_ss = 0; data_EN_tmp = 0; end read_should_next = 0; end default: begin data_A_tmp = NULL_ADDR; data_Di_tmp = 0; data_EN_tmp = 0; data_WE_tmp = 0; data_in_conv = 0; tap_A_ss = NULL_ADDR; tap_Di_ss = 0; tap_WE_ss = 0; tap_EN_ss = 0; tap_in_conv = 0; read_should_next = 0; end endcase end assign data_A = data_A_tmp; assign data_Di = data_Di_tmp; assign data_EN = data_EN_tmp; assign data_WE = data_WE_tmp; //-------------------------- Axi-Stream SM (input Y) ------------------------------- assign sm_tdata = sm_tdata_tmp; assign sm_tvalid = sm_tvalid_tmp; assign sm_tlast = ((data_cnt >= stop_early) && sm_tvalid) ? 1 : 0; //&& send_should always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin sm_tvalid_tmp <= 0; sm_tdata_tmp <= 0; send_should <= 0; send_waiting <= 0; toread_y_cnt <= 0; end else begin sm_tvalid_tmp <= sm_tvalid_next; sm_tdata_tmp <= sm_tdata_next; send_should <= send_should_next; send_waiting <= send_waiting_next; toread_y_cnt <= toread_y_cnt_next; end end always @(*) begin if (!ap_ctrl[2] && (data_cnt <= stop_early)) begin //sm_tlast if (sm_tready && send_should || sm_tready && send_waiting) begin // sm_tvalid_next = 1; sm_tdata_next = output_final; send_waiting_next = 0; end else if (send_should) begin //&& !(data_count <= tap_length) sm_tvalid_next = 0; sm_tdata_next = output_final; send_waiting_next = 1; end else begin sm_tvalid_next = 0; sm_tdata_next = 0; send_waiting_next = send_waiting; end end else begin sm_tvalid_next = 0; sm_tdata_next = 0; send_waiting_next = 0; end /*else if (!ap_ctrl[2] && sm_tready) begin sm_tvalid_next = 1; sm_tdata_next = 0; send_waiting_next = 0; end */ end //-------------------------- FIR convolution ------------------------------- assign m_tmp = (read_should) ? x * h : 0; assign y_tmp = y + m; always @(*) begin if (ss_state == WRITE_SS) begin toread_y_cnt_next = 1; end else if (toread_y_cnt == 1) begin toread_y_cnt_next = 2; end else begin toread_y_cnt_next = 0; end end always @(*) begin if (toread_y_cnt == 2 && !(y_tmp == 0)) begin // ss_state_next == WRITE_SS not write y_next = y_tmp; send_should_next = 1; end else begin y_next = output_final; send_should_next = 0; end end always @(*) begin if (read_should) begin x = data_in_conv; h = tap_in_conv; end else begin x = 0; h = 0; end end always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin m <= 0; y <= 0; output_final <= 0; end else begin m <= m_tmp; output_final <= y_next; if (send_should_next == 1 || ap_ctrl[2]) begin //QS: y <= 0; end else begin y <= y_tmp; end end end //-------------------------- Addr generator ------------------------------- always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin addr_genr <= NULL_ADDR; tap_genr <= NULL_ADDR; end else begin addr_genr <= addr_genr_next; tap_genr <= tap_genr_next; end end endmodule ``` ### AXI_Lite ```verilog always @(*) begin if (arvalid && ((araddr == 12'h10) || (araddr == 12'h14) || (araddr == 12'd0))) begin //QS: arvalid can be take out araddr_next = araddr; read_tap_next = 0; end else if (arvalid && araddr[7]) begin araddr_next = araddr & ADDR_MASK; read_tap_next = 1; end else if (arvalid) begin araddr_next = NULL_ADDR; read_tap_next = 1; end else begin araddr_next = araddr_tmp; read_tap_next = read_tap; end end always @(*) begin if ((awaddr == 12'h10) || (awaddr == 12'h14) || ((awaddr == 12'd0) && ap_ctrl[2] && (axi_state_next == WRITE))) begin awaddr_tmp = awaddr; write_tap_next = 0; end else if (awaddr[7] == 1) begin awaddr_tmp = awaddr & ADDR_MASK; write_tap_next = 1; end else begin awaddr_tmp = NULL_ADDR; write_tap_next = 1; //QS end end ``` * Known that we only store address 0x80 ~ 0xFF, only when **araddr[7]** and **awaddr[7]** == 1 should read and write address to RAM. Moreover, we need to add `arvalid &&` in the first part, because we still have to wait for **rready** and **rvalid** to handshake, before we read data. Unfortunately, **araddr** would be pull down before that, so we need a register to store the address. * `araddr == 12'h10) || (araddr == 12'h14) || (araddr == 12'd0)` is not in the RAM but should be read and write too. ```verilog always @(*) begin if (arvalid && awvalid && wvalid && axi_state_finish) begin if (write_alr) begin axi_state_next = READ; end else begin axi_state_next = WRITE; end end else if (awvalid && wvalid && axi_state_finish) begin if (write_alr) begin axi_state_next = IDLE; end else begin axi_state_next = WRITE; end end else if (arvalid && axi_state_finish) begin axi_state_next = READ; end else if (!axi_state_finish) begin axi_state_next = axi_state; end else begin axi_state_next = IDLE; end end ``` * The first if is to consider the situation when Master send **arvalid** and **awvalid && wvalid** simultaneously, but it is not valid to do that, so this part can be modified * **write_alr** is to assure the next **awvalid && wvalid** will not affect the WRITE state to two cycles. ```verilog always @(*) begin case (axi_state) WRITE: begin axi_lite_on = 1; rdata_tmp = 32'd0; rvalid_next = 0; arready_tmp = 0; done_read_next = 0; //done_read_next = done_read; if (!ap_ctrl[2] && write_tap) begin wready_tmp = 1; awready_tmp = 1; tap_EN_tmp = 0; tap_WE_tmp = 4'b1111; tap_A_tmp = awaddr_tmp; tap_Di_tmp = wdata; axi_state_finish = 1; write_alr = 1; data_length_next = data_length; tap_length_next = tap_length; ap_start_next = 0; //ap_ctrl[0] end else if (write_tap == 1) begin wready_tmp = 1; awready_tmp = 1; tap_EN_tmp = 1; tap_WE_tmp = 4'b1111; tap_A_tmp = awaddr_tmp; tap_Di_tmp = wdata; axi_state_finish = 1; write_alr = 1; data_length_next = data_length; tap_length_next = tap_length; ap_start_next = 0; //ap_ctrl[0] end else begin tap_EN_tmp = 0; tap_WE_tmp = 0; tap_A_tmp = NULL_ADDR; tap_Di_tmp = 0; if (awaddr_tmp == 12'h10) begin wready_tmp = 1; awready_tmp = 1; if (!ap_ctrl[2]) begin data_length_next = data_length; end else begin data_length_next = wdata; //ERROR end tap_length_next = tap_length; axi_state_finish = 1; write_alr = 1; ap_start_next = 0; //ap_ctrl[0] end else if (awaddr_tmp == 12'h14) begin wready_tmp = 1; awready_tmp = 1; data_length_next = data_length; if (!ap_ctrl[2]) begin tap_length_next = tap_length; end else begin tap_length_next = wdata; //ERROR end axi_state_finish = 1; write_alr = 1; ap_start_next = 0; //ap_ctrl[0] end else begin wready_tmp = 1; awready_tmp = 1; data_length_next = data_length; tap_length_next = tap_length; //ap_start_next = 1; //ERROR wdata & (3'b011) if (!ap_ctrl[2]) begin ap_start_next = 0; end else begin ap_start_next = 1; end axi_state_finish = 1; write_alr = 1; end end end READ: begin axi_lite_on = 1; write_alr = 0; wready_tmp = 0; awready_tmp = 0; //arready_tmp = 1; data_length_next = data_length; tap_length_next = tap_length; ap_start_next = 0; //ap_ctrl[0] if ((!ap_ctrl[2] || (araddr_tmp == NULL_ADDR)) && read_tap) begin done_read_next = 0; //done_read_next = done_read; if (arvalid) begin //arready && arvalid arready_tmp = 1; rdata_tmp = 32'd0; rvalid_next = 1; tap_EN_tmp = 0; tap_WE_tmp = 0; tap_A_tmp = araddr_tmp; tap_Di_tmp = wdata; axi_state_finish = 0; end else if (rready && rvalid) begin arready_tmp = 0; rdata_tmp = 32'hffffffff; rvalid_next = 0; tap_EN_tmp = 0; tap_WE_tmp = 0; tap_A_tmp = araddr_tmp; tap_Di_tmp = wdata; axi_state_finish = 1; end else begin arready_tmp = 0; rdata_tmp = 32'd0; rvalid_next = rvalid_tmp; tap_EN_tmp = 0; tap_WE_tmp = 0; tap_A_tmp = araddr_tmp; tap_Di_tmp = wdata; axi_state_finish = 0; end end else if (!read_tap) begin tap_EN_tmp = 0; tap_WE_tmp = 0; tap_A_tmp = NULL_ADDR; tap_Di_tmp = 0; if (arvalid) begin //arready && arvalid arready_tmp = 1; rdata_tmp = 32'd0; rvalid_next = 1; axi_state_finish = 0; done_read_next = 0; end else if (rready && rvalid) begin arready_tmp = 0; rvalid_next = 0; if (araddr_tmp == 12'h10) begin rdata_tmp = data_length; end else if (araddr_tmp == 12'h14) begin rdata_tmp = tap_length; end else begin rdata_tmp = ap_ctrl; end axi_state_finish = 1; if (ap_ctrl[2] && ap_ctrl[1]) begin done_read_next = 1; end else begin done_read_next = 0; end end else begin arready_tmp = 0; rdata_tmp = 32'd0; rvalid_next = rvalid_tmp; axi_state_finish = 0; done_read_next = 0; //done_read_next = done_read; end end else begin done_read_next = 0; //done_read_next = done_read; if (arvalid) begin //arready && arvalid arready_tmp = 1; rdata_tmp = 32'd0; rvalid_next = 1; tap_EN_tmp = 0; tap_WE_tmp = 0; tap_A_tmp = araddr_tmp; tap_Di_tmp = wdata; axi_state_finish = 0; end else if (rready && rvalid) begin arready_tmp = 0; rvalid_next = 0; tap_EN_tmp = 1; tap_WE_tmp = 0; tap_A_tmp = araddr_tmp; tap_Di_tmp = wdata; rdata_tmp = tap_Do; axi_state_finish = 1; end else begin arready_tmp = 0; rdata_tmp = 32'd0; rvalid_next = rvalid_tmp; tap_EN_tmp = 0; tap_WE_tmp = 0; tap_A_tmp = araddr_tmp; tap_Di_tmp = wdata; axi_state_finish = 0; end end end IDLE: begin axi_lite_on = 0; write_alr = 0; rvalid_next = 0; rdata_tmp = 32'd0; tap_EN_tmp = 0; tap_WE_tmp = 0; tap_A_tmp = NULL_ADDR; tap_Di_tmp = 0; arready_tmp = 0; wready_tmp = 0; awready_tmp = 0; axi_state_finish = 1; data_length_next = data_length; tap_length_next = tap_length; ap_start_next = 0; //ap_ctrl[0] done_read_next = 0; //done_read_next = done_read; end default: begin axi_lite_on = 0; write_alr = 0; rvalid_next = 0; rdata_tmp = 32'd0; tap_EN_tmp = 0; tap_WE_tmp = 0; tap_A_tmp = NULL_ADDR; tap_Di_tmp = 0; arready_tmp = 0; wready_tmp = 0; awready_tmp = 0; axi_state_finish = 1; data_length_next = data_length; tap_length_next = tap_length; ap_start_next = 0; //ap_ctrl[0] done_read_next = 0; //done_read_next = done_read; end endcase end ``` * for WRITE state: it is divided into 3 parts, the first part is to avoid writing tap into tap_RAM while fir engine is processing, and the final part is to write **tap_length**, **data_length** or **ap_ctrl**. Since, addr may be the same for writing tap and **tap_length**, **data_length** or **ap_ctrl**, we need to separate write tap and write other stuff. * for READ state: it is divided with the same idea, additionaly, **done_read_next** is for pulling **ap_done** down after reading it. ```verilog always @(*) begin if (axi_lite_on && !ss_on) begin tap_A_all = tap_A_tmp; tap_Di_all = tap_Di_tmp; tap_EN_all = tap_EN_tmp; tap_WE_all = tap_WE_tmp; end else begin tap_A_all = tap_A_ss; tap_Di_all = tap_Di_ss; tap_EN_all = tap_EN_ss; tap_WE_all = tap_WE_ss; end end assign tap_A = tap_A_all; assign tap_Di = tap_Di_all; assign tap_EN = tap_EN_all; assign tap_WE = tap_WE_all; ``` * It is design for the reason that when doing convolution we need to take tap from tap_RAM, in order to avoid IDLE state in AXI_Lite affect convolution, we assign tap_RAM port to **ss_state** when axi_lite is not on. ### ap_ctrl ```verilog always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin ap_ctrl[2] <= 1; ap_ctrl[1] <= 0; ap_ctrl[0] <= 0; end else begin ap_ctrl[2] <= ap_idle_next; ap_ctrl[1] <= ap_done_next; ap_ctrl[0] <= ap_start_next; end end always @(*) begin if (ap_ctrl[0] == 1) begin ap_idle_next = 0; end else if (sm_tlast == 1) begin ap_idle_next = 1; end else begin ap_idle_next = ap_ctrl[2]; end if (sm_tlast == 1 && !ap_ctrl[2]) begin ap_done_next = 1; end else if (ap_ctrl[2] && done_read) begin ap_done_next = 0; end else begin ap_done_next = ap_ctrl[1]; end end ``` * This is part is explain in the Introduction part. ### AXI_Stream SS ```verilog always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin data_cnt <= 0; ss_state <= IDLE_SS; ss_tready_tmp <= 0; read_should <= 0; current <= 0; wait_sm <= 0; stop_early <= (1 << pDATA_WIDTH) - 1; lock_off <= 1; end else begin data_cnt <= data_cnt_next; ss_state <= ss_state_next; ss_tready_tmp <= ss_tready_next; read_should <= read_should_next; current <= current_next; wait_sm <= wait_sm_next; stop_early <= stop_early_next; lock_off <= lock_off_next; end end assign tap_length_large = tap_length; always @(*) begin if (data_cnt < tap_length_large) begin operation = data_cnt; end else begin operation = tap_length - 1; end end always @(*) begin if (ss_tlast && lock_off) begin stop_early_next = data_cnt + 1; lock_off_next = 0; end else if (ap_ctrl[2] && done_read == 1) begin stop_early_next = (1 << pDATA_WIDTH) - 1; //QS: lock_off_next = 1; end else begin stop_early_next = stop_early; lock_off_next = lock_off; end end always @(*) begin if ((data_cnt <= data_length) && (data_cnt <= stop_early) && !ap_ctrl[2]) begin // wait_sm to let y_tmp get //&& !ss_tlast ss_on = 1; //debug_ss = 1; if (ss_tvalid && data_cnt == 0 && ss_tdata != 0) begin ss_state_next = WRITE_SS; data_cnt_next = data_cnt + 1; ss_tready_next = 1; current_next = 0; addr_genr_next = 0; tap_genr_next = 0; //(tap_length - 1) << 2 wait_sm_next = 1; //debug_ss = 0; end else if (ss_tvalid && (current == operation) && !send_waiting_next) begin //!send_waiting && ss_tdata != 0 ss_state_next = WRITE_SS; data_cnt_next = data_cnt + 1; ss_tready_next = 1; current_next = 0; wait_sm_next = 1; //debug_ss = 1; if (addr_genr == ((tap_length - 1) << 2)) begin addr_genr_next = 0; end else begin addr_genr_next = addr_genr + 3'd4; end if (tap_genr == 0) begin tap_genr_next = ((tap_length - 1) << 2); end else begin tap_genr_next = tap_genr - 3'd4; end end else if ((current < operation) && current == 0 && !wait_sm && !send_waiting_next) begin ss_state_next = READ_SS; data_cnt_next = data_cnt; ss_tready_next = 0; current_next = current + 1; wait_sm_next = 0; //debug_ss = 2; if (data_cnt < tap_length_large) begin addr_genr_next = current; tap_genr_next = (operation * 3'd4); //((tap_length - 1) << 2) - (operation * 3'd4) end else begin addr_genr_next = ((data_cnt - tap_length + 1) % tap_length) * 3'd4; tap_genr_next = ((tap_length - 1) << 2); //0 end end else if (current < operation && !wait_sm && !send_waiting_next) begin ss_state_next = READ_SS; data_cnt_next = data_cnt; ss_tready_next = 0; current_next = current + 1; wait_sm_next = 0; //debug_ss = 3; if (addr_genr == ((tap_length - 1) << 2)) begin addr_genr_next = 0; end else begin addr_genr_next = addr_genr + 3'd4; end if (tap_genr == 0) begin tap_genr_next = ((tap_length - 1) << 2); end else begin tap_genr_next = tap_genr - 3'd4; end end else begin ss_state_next = IDLE_SS; data_cnt_next = data_cnt; ss_tready_next = 0; current_next = current; addr_genr_next = addr_genr; tap_genr_next = tap_genr; wait_sm_next = 0; //debug_ss = 4; end end else begin ss_state_next = IDLE_SS; if (done_read == 1) begin data_cnt_next = 0; //QS: end else begin data_cnt_next = data_cnt; end ss_tready_next = 0; current_next = 0; ss_on = 0; addr_genr_next = NULL_ADDR; tap_genr_next = NULL_ADDR; wait_sm_next = 0; //debug_ss = 5; end end ``` * **tap_length_large** is used because `data_cnt < tap_length_large` requires both operands to be in the same array space. * **operation** is how many operation in doing convolution, ex. writing x1 in, after that do x1 * tap1, only do one operation. * **lock_off** is to ensure **stop_early** change once in a set of data x, and **stop_early** is to control the state in AXI_Stream. * **wait_sm_next** is to delay one cycle, since we do convolution in pipelining, after we write in the first x1, we will have to wait two more cycle to get output y1, in order not to add wrong (include multiplication in the next round: calculating y2), we choose to add an IDLE state between every round. * **current** is used for counting in a round, determine the state should be. * **send_waiting_next** is used for waiting the testbench to receive the output data y, if testbench haven't rise the **sm_tready** yet, the **ss_state** will stay in the IDLE_SS state. * address generator is already integrated into the ss_state. ### AXI_Stream SM ```verilog assign sm_tdata = sm_tdata_tmp; assign sm_tvalid = sm_tvalid_tmp; assign sm_tlast = ((data_cnt >= stop_early) && sm_tvalid) ? 1 : 0; //&& send_should always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin sm_tvalid_tmp <= 0; sm_tdata_tmp <= 0; send_should <= 0; send_waiting <= 0; toread_y_cnt <= 0; end else begin sm_tvalid_tmp <= sm_tvalid_next; sm_tdata_tmp <= sm_tdata_next; send_should <= send_should_next; send_waiting <= send_waiting_next; toread_y_cnt <= toread_y_cnt_next; end end always @(*) begin if (!ap_ctrl[2] && (data_cnt <= stop_early)) begin //sm_tlast if (sm_tready && send_should || sm_tready && send_waiting) begin // sm_tvalid_next = 1; sm_tdata_next = output_final; send_waiting_next = 0; end else if (send_should) begin //&& !(data_count <= tap_length) sm_tvalid_next = 0; sm_tdata_next = output_final; send_waiting_next = 1; end else begin sm_tvalid_next = 0; sm_tdata_next = 0; send_waiting_next = send_waiting; end end else begin sm_tvalid_next = 0; sm_tdata_next = 0; send_waiting_next = 0; end /*else if (!ap_ctrl[2] && sm_tready) begin sm_tvalid_next = 1; sm_tdata_next = 0; send_waiting_next = 0; end */ end ``` * **send_should** will be used in the convolution session, as said before, after the data x input, we will have to wait for 2 state cycle to get output y, so **send_should** rise to 1 when it count to the 2nd state cycle. ### Fir Convolution ```verilog assign m_tmp = (read_should) ? x * h : 0; assign y_tmp = y + m; always @(*) begin if (ss_state == WRITE_SS) begin toread_y_cnt_next = 1; end else if (toread_y_cnt == 1) begin toread_y_cnt_next = 2; end else begin toread_y_cnt_next = 0; end end always @(*) begin if (toread_y_cnt == 2 && !(y_tmp == 0)) begin // ss_state_next == WRITE_SS not write y_next = y_tmp; send_should_next = 1; end else begin y_next = output_final; send_should_next = 0; end end always @(*) begin if (read_should) begin x = data_in_conv; h = tap_in_conv; end else begin x = 0; h = 0; end end always @(posedge axis_clk or negedge axis_rst_n) begin if (!axis_rst_n) begin m <= 0; y <= 0; output_final <= 0; end else begin m <= m_tmp; output_final <= y_next; if (send_should_next == 1 || ap_ctrl[2]) begin //QS: y <= 0; end else begin y <= y_tmp; end end end ``` * we use **output_final** to storage the output y, since sm_tready may not rise to 1 as expected. ## Result ### AXI_Lite ![image](https://hackmd.io/_uploads/BkcrzVV21l.png) * axi_state: `00 = WRITE`, `01 = READ`, `10 = IDLE` ### AXI_Stream ![image](https://hackmd.io/_uploads/B13rv4En1x.png) * ss_state: `00 = WRITE_SS`, `01 = READ_SS`, `10 = IDLE_SS` ### Area ![image](https://hackmd.io/_uploads/BkHTdVVn1x.png)