[TOC] # 1. Parity Generator ## Top Function (Kernel) ### In HLS ``` cpp= // ap_uint<W> a means "a" is a W-bit unsigned integer bool parity_generator(ap_uint<W> a) { #pragma HLS INTERFACE ap_none port=a #pragma HLS INTERFACE ap_ctrl_none port=return bool parity = 0; for (int i = 0; i < W; i++){ #pragma HLS UNROLL parity = parity ^ a[i]; } return parity; } ``` #### Without UNROLL Following is the logic if we don't add `#pragma HLS UNROLL` to unroll the loop. ![](https://hackmd.io/_uploads/Hk2hZYmo3.jpg) - A circuit with feedback incorporates some form of memory or state storage, which allows it to consider past input or outputs when generating the current output. - These circuits are known as sequential logic circuits. Following are the utilization and the dataflow if we don't unroll the loop. ![](https://hackmd.io/_uploads/ryy9zKms2.jpg) #### Loop Unrolling We can unroll the loop manually ![](https://hackmd.io/_uploads/rJ9NQKmj3.jpg) - The green code is one way of manually unrolling the loop. - Unroll loops can create multiple independent operations rather than a single collection of operations. #### **#pragma HLS UNROLL** We can unroll the loop automatically by adding `pragma HLS UNROLL`, ![](https://hackmd.io/_uploads/SkYbEYmoh.png) and it will create a balanced tree structure as follow: ![](https://hackmd.io/_uploads/SJcuNFmsh.jpg) - Transforms loops by creating multiple copies of the loop body in RTL design, which allows some or all loop iterations to occur in parallel. #### .v file in HLS: - As we've added `#pragma HLS UNROLL` in our design, HLS creates the balanced structure as I mentioned before: ![](https://hackmd.io/_uploads/SJi7IFQj2.png) ![](https://hackmd.io/_uploads/HybmIKXon.jpg) - Following are the utilization and the dataflow if we added `#pragma HLS UNROLL` in our design. ![](https://hackmd.io/_uploads/Hy_lrYQs2.jpg) #### Parity Generator in Testbench(golden in HLS) ``` cpp= bool parity_generator_golden(ap_uint<W> a) { bool parity = 0; for (int i = 0; i < W; i++) { parity = parity ^ a[i]; } return parity; } ``` #### Result: - Which generates even parity. ![](https://hackmd.io/_uploads/HkYnBFQs3.png) ### In Verilog - Here is the kernel which writes with Verilog: ``` verilog= module parity_generator( input wire [15:0] a, output reg [0:0] parity ); integer i; always @* begin parity = 1'b0; for (i = 0; i < 16; i = i + 1) begin parity = parity ^ a[i]; end end endmodule ``` ## Testbench - Testbench in Verilog ``` verilog= module vic_group3_testbench(); wire [0:0] parity_hw; reg [0:0] parity_sw; reg [15:0] a_hw; // Design under test design_1_wrapper DUT0 (.a(a_hw), .parity(parity_hw)); // parity_generator DUT1 (.a(a_hw), .parity(parity_hw)); reg [15:0] tmp; integer i, j; ``` - Golden parity generator in the testbench ``` verilog=13 task parity_generator_sw; input [15:0] a; output [0:0] parity_sw; begin parity_sw = 1'b0; for (i = 0; i < 16; i = i+1) begin parity_sw = parity_sw ^ a[i]; end end endtask ``` - Main function of testbench ``` verilog=24 initial begin tmp = 16'd0; // initialize the value for (j = 0; j < 65536; j = j + 1) begin a_hw = tmp; // Get the value of golden parity_generator_sw(tmp, parity_sw); // Compare the value after the short delay #5 if (parity_hw != parity_sw) begin $display("Error at %b, parity_sw = %b, parity_hw = %b", tmp, parity_sw, parity_hw); end else begin // $display("%b, parity_sw = %b, parity_hw = %b", tmp, parity_sw, parity_hw); end // tmp + 1 after a short delay to avoid the mismatch of output #5 tmp = tmp + 1'b1; end end endmodule ``` ### DUT (IP) - Export RTL IP and Create Block Design in Vivado ![](https://hackmd.io/_uploads/rkXsUtQsn.png) #### Utilization ![](https://hackmd.io/_uploads/HyMU2SLi2.png) - Only used 3 LUT. #### Waveform ![](https://hackmd.io/_uploads/r1XGFtms2.png) - The values are all the same ### DUT (Verilog) #### Utilization ![](https://hackmd.io/_uploads/rkrRhHUon.png) #### Waveform ![](https://hackmd.io/_uploads/HJeOhr8s3.png) ------------------------------------------------------- # 2. Leading One ## Top Function (Kernel) ### In HLS ![](https://hackmd.io/_uploads/BkZODjQin.png) #### Utilization & Dataflow in HLS - If we use the clock period 10ns, it may transfer to sequential. ![](https://hackmd.io/_uploads/BJQ2wiQjn.png) - Here are the utilization and dataflow: ![](https://hackmd.io/_uploads/rkuCvimoh.png) ![](https://hackmd.io/_uploads/rJOyOjXsn.png) ![](https://hackmd.io/_uploads/Hkb1_iQj2.png) - **So, the solution uses a longer clock period to synthesize.** ![](https://hackmd.io/_uploads/Hy4Q_iQo3.png) - Then, the utilization and dataflow in HLS: ![](https://hackmd.io/_uploads/ryGrOsms2.png) - We can see the design only using LUT(Combinational). ![](https://hackmd.io/_uploads/BJtHdo7i2.png) - Following is the golden `leading_one` in the testbench. ![](https://hackmd.io/_uploads/ryH0uj7i3.png) #### Result ![](https://hackmd.io/_uploads/SyW-Ys7in.png) ### In Verilog ``` verilog= module leading_one( input wire [8:0] a, output reg [4:0] one ); always @* begin if (a[8] == 1'b1) begin one = 5'd8; end else if (a[7] == 1'b1) begin one = 5'd7; end else if (a[6] == 1'b1) begin one = 5'd6; end else if (a[5] == 1'b1) begin one = 5'd5; end else if (a[4] == 1'b1) begin one = 5'd4; end else if (a[3] == 1'b1) begin one = 5'd3; end else if (a[2] == 1'b1) begin one = 5'd2; end else if (a[1] == 1'b1) begin one = 5'd1; end else if (a[0] == 1'b1) begin one = 5'd0; end else one = 5'd0 - 1'b1; end endmodule ``` ## Testbench - Testbench in Verilog ``` verilog= module vic_group3_testbench(); wire [4:0] one_hw; reg [4:0] one_sw; reg [8:0] a_hw; // design under test design_1_wrapper DUT0 (.a(a_hw), .leading_one(one_hw)); // leading_one DUT1 (.a(a_hw), . one(one_hw)); reg [8:0] tmp; integer i; ``` - Write a golden leading_one_sw in the testbench ``` verilog=13 task leading_one_sw; input [8:0] a; output [4:0] leading_one_sw; begin if (a[8] == 1'b1) begin leading_one_sw = 5'd8; end else if (a[7] == 1'b1) begin leading_one_sw = 5'd7; end else if (a[6] == 1'b1) begin leading_one_sw = 5'd6; end else if (a[5] == 1'b1) begin leading_one_sw = 5'd5; end else if (a[4] == 1'b1) begin leading_one_sw = 5'd4; end else if (a[3] == 1'b1) begin leading_one_sw = 5'd3; end else if (a[2] == 1'b1) begin leading_one_sw = 5'd2; end else if (a[1] == 1'b1) begin leading_one_sw = 5'd1; end else if (a[0] == 1'b1) begin leading_one_sw = 5'd0; end else begin leading_one_sw = 5'd0 - 1'b1; end end endtask ``` - Then, the main function of the testbench. ``` verilog=50 initial begin for (i = 0; i < 512; i = i + 1) begin a_hw = tmp; // get the value from golden leading_one_sw(tmp, one_sw); // compare the value after a short delay in order to get the right value #5 if (one_hw != one_sw) begin $display("Error at %b, leading_one_sw = %b, leading_one_hw = %b", tmp, one_sw, one_hw); end else begin $display("%b, leading_one_sw = %b, leading_one_hw = %b", tmp, one_sw, one_hw); end // update the tmp after a short delay to avoid the mismatch of output and input #5 tmp = tmp + 1'b1; end end endmodule ``` ### DUT (IP) - Export RTL IP and Create Block Design in Vivado ![](https://hackmd.io/_uploads/S1xMYiQi3.png) #### Utilization ![](https://hackmd.io/_uploads/HyyOABUi2.png) #### Waveform ![](https://hackmd.io/_uploads/rkGWnsmin.png) ### DUT (Verilog) #### Utilization ![](https://hackmd.io/_uploads/rkW5gIIj3.png) #### Waveform ![](https://hackmd.io/_uploads/ry6oy8Lj3.png) ---------------------------------------------------- # 3. Integer Division&Modulus ## Top Function (Kernel) ### In HLS ``` cpp= const int n = 1234101; int divbyconstant(int a) { #pragma HLS INLINE off return a / n; } void integer_division_modulus(int a, int &r) { #pragma HLS INTERFACE ap_ctrl_none port=return #pragma HLS INTERFACE ap_none port=a #pragma HLS INTERFACE ap_none port=r r = a - n * divbyconstant(a); } ``` - It generates the remainder of a. #### Utilization in HLS ![](https://hackmd.io/_uploads/HJHBPfrj2.png) - The operation is still combinational since it only uses DSP and LUT. - There are 17 FFs for registers. ### In Verilog ``` verilog= `define N 32'd1234101 module integer_division_modulus( input wire [31:0] a_in, output reg [31:0] r_out ); reg [31:0] tmp; function [31:0] divbyconst; input [31:0] a_in; begin divbyconst = a_in / `N; end endfunction always @* begin tmp = divbyconst (a_in); r_out = a_in - (`N * tmp); end endmodule ``` ## Testbench - Testbench in Verilog ``` verilog= `define N 32'd1234101 module VIC_TESTBENCH(); reg clk; // 10ns clk; reg rst; // active high, synchronous reset reg [31:0] a_in; // input of kernel and golden wire [31:0] r_hw; // remainder of Kernel reg [31:0] r_sw; // remainder of Golden reg [0:0] error_detect; // detect the error, if 1, error integer i; // design under test design_1_wrapper DUT (.ap_clk(clk), .ap_rst(rst), .a(a_in), .r(r_hw)); // integer_division_modulus (.a_in(a_in), // .r_out(r_hw)); // 10ns clk generator initial begin clk = 1'b1; forever begin #5 clk = ~clk; end end // divide by constant task divbyconst; input [31:0] a_in; output [31:0] d_out; begin d_out = a_in / `N; end endtask // golden integer_division_modulus task golden_integer_division_modulus; input [31:0] a_in; // the 32-bit input output [31:0] r_out; // the 32-bit output remainder reg [31:0] tmp; // temperate value which will be the output of divbyconst begin tmp = 32'd0; // initialize the tmp divbyconst(a_in, tmp); r_out = a_in - (`N * tmp); end endtask // main function of the testbench initial begin // initialize the value of reset, input, error_detect rst = 1'b1; a_in = `N; error_detect = 1'b0; // rst fall after 50ns, since the first 40ns of kernel don't have any output #50 @(posedge clk) rst = 1'b0; // Compare the value of kernel and golden for (i = 0; i < 3000; i = i + 1) begin @(posedge clk) golden_integer_division_modulus(a_in, r_sw); #5 if (r_sw != r_hw) begin $display("Error at %d, r_sw = %d, r_hw = %d", a_in, r_sw, r_hw); error_detect = 1'b1; end else begin /* $display("%d, r_sw = %d, r_hw = %d", a_in, r_sw, r_hw); */ end #4 a_in = a_in + 1'b1; end // Check the error if (error_detect != 1'b0) begin $display("Failed!!!"); end else begin $display("Test Passed!!"); end end endmodule ``` ### DUT (IP) - Export RTL IP and create block design in Vivado ![](https://hackmd.io/_uploads/HJ0pvPSjn.png) #### Utilization ![](https://hackmd.io/_uploads/Hy3-XdBsh.png) - 5 FFs for registers #### Waveform ![](https://hackmd.io/_uploads/HJSf7OBi3.png) ### DUT (Verilog) #### Utilization ![](https://hackmd.io/_uploads/SkVqidUon.png) - It uses only LUT in Verilog design, not like HLS design contains FFs for registers. #### Waveform ![](https://hackmd.io/_uploads/H15nj_Lo2.png) - We can know that the `r_hw` of Verilog design output is at the same time as the input `a_in`, since it is purely combinational logic with LUT. -------------------------------------------------------- # 4. Binary-to-BCD (Division) ## Top Function (Kernel) ### In HLS ``` cpp= void get_digit(uint14 &a, uint4 &digit) { digit = a - 10 * (a / 10); a = a / 10; } void binary2bcd_div(uint14 in_binary, uint16 *packed_bcd) { #pragma HLS INTERFACE ap_none port=in_binary #pragma HLS INTERFACE ap_none port=packed_bcd #pragma HLS INTERFACE ap_ctrl_none port=return uint14 a = in_binary; uint4 digit_0; uint4 digit_1; uint4 digit_2; uint4 digit_3; get_digit(a, digit_0); get_digit(a, digit_1); get_digit(a, digit_2); get_digit(a, digit_3); *packed_bcd = (digit_3, digit_2, digit_1, digit_0); } ``` ### In Verilog ``` verilog= module binary2bcd_div( input wire [13:0] in_binary, output reg [15:0] packed_bcd ); reg [13:0] a; reg [3:0] digit_0; reg [3:0] digit_1; reg [3:0] digit_2; reg [3:0] digit_3; function [3:0] get_digit; input [13:0] a; begin get_digit = a - 10 * (a / 10); end endfunction always @* begin a = in_binary; digit_0 = get_digit(a); a = a / 10; digit_1 = get_digit(a); a = a / 10; digit_2 = get_digit(a); a = a / 10; digit_3 = get_digit(a); a = a / 10; packed_bcd = {digit_3, digit_2, digit_1, digit_0}; end endmodule ``` ## Testbench - Testbench in Verilog ``` verilog= module VIC_TESTBENCH(); reg clk; reg rst; reg [13:0] in_binary; wire [15:0] packed_bcd_hw; reg [15:0] packed_bcd_sw; reg [0:0] error_detect; integer i; // design under test design_1_wrapper DUT0 (.ap_clk(clk), .ap_rst(rst), .in_binary(in_binary), .packed_bcd(packed_bcd_hw)); /* binary2bcd_div DUT1 (.in_binary(in_binary), .packed_bcd(packed_bcd_hw)); */ // 10ns clk generator initial begin clk = 1'b1; forever begin #5 clk = ~clk; end end // Golden get_digit task get_digit; input [13:0] a; output [3:0] digit; begin digit = a - 10 * (a / 10); end endtask // golden task binary2bcd_sw; input [13:0] in_binary; output [15:0] packed_bcd; reg [13:0] a; reg [3:0] digit_0; reg [3:0] digit_1; reg [3:0] digit_2; reg [3:0] digit_3; begin a = in_binary; get_digit (a, digit_0); a = a / 10; get_digit (a, digit_1); a = a / 10; get_digit (a, digit_2); a = a / 10; get_digit (a, digit_3); a = a / 10; packed_bcd = {digit_3, digit_2, digit_1, digit_0}; end endtask //main function of the testbench initial begin rst = 1'b1; in_binary = 14'd0; error_detect = 1'b0; #30 @(posedge clk) rst = 1'b0; for (i = 0; i < 9999; i = i + 1) begin @(posedge clk) binary2bcd_sw (in_binary, packed_bcd_sw); #80 if (packed_bcd_sw != packed_bcd_hw) begin $display("Error at %d, bcd_sw = %b, bcd_hw = %b", in_binary, packed_bcd_sw, packed_bcd_hw); error_detect = 1'b1; end #0 in_binary = in_binary + 1'b1; end #5 if (error_detect == 1'b0) begin $display("Test Passed"); end else begin $display("Error!!!"); end end endmodule ``` ### DUT (IP) - Export RTL IP and create a block design ![](https://hackmd.io/_uploads/rJmzbFqin.png) #### Utilization ![](https://hackmd.io/_uploads/B1uwVEqj2.png) - There are 9 FFs for registers. #### Waveform 1. This is the waveform of IP if I change the `in_binary` every 50ns, which clock period is 10ns. ![](https://hackmd.io/_uploads/Hk7bb4cj2.png) - We can see that the longest period appears every 4 `packed_bcd_hw`, which is longer than others(40ns longer). 2. Then, I change the `in_binary` every 90ns. ![](https://hackmd.io/_uploads/SJaQGE5ih.png) - We can see that `packed_bcd_hw` and `packed_bcd_sw` match each other in every 4 times we change the `in_binary`. 3. Both 1, and 2 are correct. - ![](https://hackmd.io/_uploads/rJLQQV9jn.png) ### DUT (Verilog) #### Utilization ![](https://hackmd.io/_uploads/Hk5kA49oh.png) - Verilog design only uses LUTs, unlike the IP using FFs for registers. #### Waveform ![](https://hackmd.io/_uploads/ByVLpEqo3.png) ![](https://hackmd.io/_uploads/ryoI64qoh.png) --------------------------------------------------- # Binary-to-BCD (Double Dabble) ## Top Function (Kernel) ### In HLS ``` cpp= void double_dabble(uint16 &scratch_pad) { #pragma HLS INLINE scratch_pad = scratch_pad << 1; if (scratch_pad(11, 8) > 4) scratch_pad(11, 8) = scratch_pad(11, 8) + 3; } void binary2bcd_double_dabble(uint8 in_binary, uint16 *unpacked_bcd, uint8 *packed_bcd) { #pragma HLS INTERFACE ap_none port=in_binary #pragma HLS INTERFACE ap_none port=unpacked_bcd #pragma HLS INTERFACE ap_none port=packed_bcd #pragma HLS INTERFACE ap_ctrl_none port=return uint16 scratch_pad = in_binary; uint4 zero_4 = 0b0000; double_dabble(scratch_pad); double_dabble(scratch_pad); double_dabble(scratch_pad); double_dabble(scratch_pad); double_dabble(scratch_pad); double_dabble(scratch_pad); double_dabble(scratch_pad); scratch_pad = scratch_pad << 1; *packed_bcd = scratch_pad(15, 8); *unpacked_bcd = (zero_4, scratch_pad(15, 12), zero_4, scratch_pad(11, 8)); } ``` ### In Verilog ``` verilog= module binary2bcd_double_dabble( input wire [7:0] in_binary, // binary input output reg [15:0] unpacked_bcd, // 16-bit bcd output output reg [7:0] packed_bcd // 8-bit bcd output ); reg [15:0] num_list; reg [16:0] tmp_list; reg [7:0] bcd; integer i; always @* begin num_list = {8'd0, in_binary}; // shift and compare 7 times for (i = 0; i < 7; i = i + 1) begin tmp_list = num_list * 2; bcd = tmp_list[15:8]; // whether > 4? if (bcd[3:0] > 3'd4) begin bcd = bcd + 2'd3; num_list = {bcd, tmp_list[7:0]}; end else begin num_list = {bcd, tmp_list[7:0]}; end end // In the last time, only need to shift. tmp_list = num_list * 2; bcd = tmp_list[15:8]; num_list = {bcd, tmp_list[7:0]}; // bcd output packed_bcd = num_list[15:8]; unpacked_bcd = {4'd0, num_list[15:12], 4'd0, num_list[11:8]}; end endmodule ``` ## Testbench - Testbench in Verilog ``` verilog= module VIC_TESTBENCH(); reg [7:0] in_binary; // binary input wire [15:0] up_bcd_hw; // unpacked bcd number wire [7:0] p_bcd_hw; // packed bcd number reg [15:0] up_bcd_sw; // software reg [7:0] p_bcd_sw; // software reg [0:0] error // error detect // design under test design_1_wrapper DUT0 (.in_binary(in_binary), .unpacked_bcd(up_bcd_hw), .packed_bcd(p_bcd_hw)); /* binary2bcd_double_dabble DUT1 (.in_binary(in_binary), .unpacked_bcd(up_bcd_hw), .packed_bcd(p_bcd_hw)); */ // double dabble in software task double_dabble; inout [15:0] scratch_pad; begin scratch_pad = scratch_pad * 2; if (scratch_pad[11:8] > 3'd4) begin scratch_pad[11:8] = scratch_pad[11:8] + 2'd3; end end endtask // golden task binary2bcd_double_dabble; input [7:0] in_binary; output [15:0] unpacked_bcd; output [7:0] packed_bcd; reg [15:0] scratch_pad; integer i; begin scratch_pad = {8'd0, in_binary}; for (i = 0; i < 7; i = i + 1) begin double_dabble (scratch_pad); end scratch_pad = scratch_pad * 2; packed_bcd = scratch_pad[15:8]; unpacked_bcd = {4'd0, scratch_pad[15:12], 4'd0, scratch_pad[11:8]}; end // main function of the testbench initial begin // initialize the value in_binary = 8'd0; error = 1'b0; for (i = 0; i < 99; i = i + 1) begin binary2bcd_double_dabble (in_binary, up_bcd_sw, p_bcd_sw); #5 if (up_bcd_sw != up_bcd_hw) begin $display("Error at %d, bcd_sw = %b, bcd_hw = %b", in_binary, p_bcd_sw, p_bcd_hw); end #5 in_binary = in_binary + 1'b1; end if (error == 1'b1) begin $display("Error!!"); end else begin $display("Test Passed!!!"); end end endmodule ``` ### DUT (IP) - Export RTL IP and create a block design ![](https://hackmd.io/_uploads/H1UPd99i3.png) #### Utilization ![](https://hackmd.io/_uploads/r1JPd9cjh.png) - Design only contains 5 LUTs. #### Waveform ![](https://hackmd.io/_uploads/ByMK_qcsh.png) ![](https://hackmd.io/_uploads/HJtF_5qsn.png) ### DUT (Verilog) #### Utilization ![](https://hackmd.io/_uploads/rkT3uq5i3.png) - Design contains 9 LUTs, more than IP. #### Waveform ![](https://hackmd.io/_uploads/H1LAO99on.png) ![](https://hackmd.io/_uploads/rJA0d55i3.png)