[TOC]
# 1. Parity Generator
## Top Function (Kernel)
### In HLS
``` cpp=
// ap_uint<W> a means "a" is a W-bit unsigned integer
bool parity_generator(ap_uint<W> a) {
#pragma HLS INTERFACE ap_none port=a
#pragma HLS INTERFACE ap_ctrl_none port=return
bool parity = 0;
for (int i = 0; i < W; i++){
#pragma HLS UNROLL
parity = parity ^ a[i];
}
return parity;
}
```
#### Without UNROLL
Following is the logic if we don't add `#pragma HLS UNROLL` to unroll the loop.

- A circuit with feedback incorporates some form of memory or state storage, which allows it to consider past input or outputs when generating the current output.
- These circuits are known as sequential logic circuits.
Following are the utilization and the dataflow if we don't unroll the loop.

#### Loop Unrolling
We can unroll the loop manually

- The green code is one way of manually unrolling the loop.
- Unroll loops can create multiple independent operations rather than a single collection of operations.
#### **#pragma HLS UNROLL**
We can unroll the loop automatically by adding `pragma HLS UNROLL`,

and it will create a balanced tree structure as follow:

- Transforms loops by creating multiple copies of the loop body in RTL design, which allows some or all loop iterations to occur in parallel.
#### .v file in HLS:
- As we've added `#pragma HLS UNROLL` in our design, HLS creates the balanced structure as I mentioned before:


- Following are the utilization and the dataflow if we added `#pragma HLS UNROLL` in our design.

#### Parity Generator in Testbench(golden in HLS)
``` cpp=
bool parity_generator_golden(ap_uint<W> a) {
bool parity = 0;
for (int i = 0; i < W; i++) {
parity = parity ^ a[i];
}
return parity;
}
```
#### Result:
- Which generates even parity.

### In Verilog
- Here is the kernel which writes with Verilog:
``` verilog=
module parity_generator(
input wire [15:0] a,
output reg [0:0] parity
);
integer i;
always @* begin
parity = 1'b0;
for (i = 0; i < 16; i = i + 1) begin
parity = parity ^ a[i];
end
end
endmodule
```
## Testbench
- Testbench in Verilog
``` verilog=
module vic_group3_testbench();
wire [0:0] parity_hw;
reg [0:0] parity_sw;
reg [15:0] a_hw;
// Design under test
design_1_wrapper DUT0 (.a(a_hw), .parity(parity_hw));
// parity_generator DUT1 (.a(a_hw), .parity(parity_hw));
reg [15:0] tmp;
integer i, j;
```
- Golden parity generator in the testbench
``` verilog=13
task parity_generator_sw;
input [15:0] a;
output [0:0] parity_sw;
begin
parity_sw = 1'b0;
for (i = 0; i < 16; i = i+1) begin
parity_sw = parity_sw ^ a[i];
end
end
endtask
```
- Main function of testbench
``` verilog=24
initial begin
tmp = 16'd0; // initialize the value
for (j = 0; j < 65536; j = j + 1) begin
a_hw = tmp;
// Get the value of golden
parity_generator_sw(tmp, parity_sw);
// Compare the value after the short delay
#5 if (parity_hw != parity_sw) begin
$display("Error at %b, parity_sw = %b, parity_hw = %b", tmp, parity_sw, parity_hw);
end
else begin
// $display("%b, parity_sw = %b, parity_hw = %b", tmp, parity_sw, parity_hw);
end
// tmp + 1 after a short delay to avoid the mismatch of output
#5 tmp = tmp + 1'b1;
end
end
endmodule
```
### DUT (IP)
- Export RTL IP and Create Block Design in Vivado

#### Utilization

- Only used 3 LUT.
#### Waveform

- The values are all the same
### DUT (Verilog)
#### Utilization

#### Waveform

-------------------------------------------------------
# 2. Leading One
## Top Function (Kernel)
### In HLS

#### Utilization & Dataflow in HLS
- If we use the clock period 10ns, it may transfer to sequential.

- Here are the utilization and dataflow:



- **So, the solution uses a longer clock period to synthesize.**

- Then, the utilization and dataflow in HLS:

- We can see the design only using LUT(Combinational).

- Following is the golden `leading_one` in the testbench.

#### Result

### In Verilog
``` verilog=
module leading_one(
input wire [8:0] a,
output reg [4:0] one
);
always @* begin
if (a[8] == 1'b1) begin
one = 5'd8;
end
else if (a[7] == 1'b1) begin
one = 5'd7;
end
else if (a[6] == 1'b1) begin
one = 5'd6;
end
else if (a[5] == 1'b1) begin
one = 5'd5;
end
else if (a[4] == 1'b1) begin
one = 5'd4;
end
else if (a[3] == 1'b1) begin
one = 5'd3;
end
else if (a[2] == 1'b1) begin
one = 5'd2;
end
else if (a[1] == 1'b1) begin
one = 5'd1;
end
else if (a[0] == 1'b1) begin
one = 5'd0;
end
else
one = 5'd0 - 1'b1;
end
endmodule
```
## Testbench
- Testbench in Verilog
``` verilog=
module vic_group3_testbench();
wire [4:0] one_hw;
reg [4:0] one_sw;
reg [8:0] a_hw;
// design under test
design_1_wrapper DUT0 (.a(a_hw), .leading_one(one_hw));
// leading_one DUT1 (.a(a_hw), . one(one_hw));
reg [8:0] tmp;
integer i;
```
- Write a golden leading_one_sw in the testbench
``` verilog=13
task leading_one_sw;
input [8:0] a;
output [4:0] leading_one_sw;
begin
if (a[8] == 1'b1) begin
leading_one_sw = 5'd8;
end
else if (a[7] == 1'b1) begin
leading_one_sw = 5'd7;
end
else if (a[6] == 1'b1) begin
leading_one_sw = 5'd6;
end
else if (a[5] == 1'b1) begin
leading_one_sw = 5'd5;
end
else if (a[4] == 1'b1) begin
leading_one_sw = 5'd4;
end
else if (a[3] == 1'b1) begin
leading_one_sw = 5'd3;
end
else if (a[2] == 1'b1) begin
leading_one_sw = 5'd2;
end
else if (a[1] == 1'b1) begin
leading_one_sw = 5'd1;
end
else if (a[0] == 1'b1) begin
leading_one_sw = 5'd0;
end
else begin
leading_one_sw = 5'd0 - 1'b1;
end
end
endtask
```
- Then, the main function of the testbench.
``` verilog=50
initial begin
for (i = 0; i < 512; i = i + 1) begin
a_hw = tmp;
// get the value from golden
leading_one_sw(tmp, one_sw);
// compare the value after a short delay in order to get the right value
#5 if (one_hw != one_sw) begin
$display("Error at %b, leading_one_sw = %b, leading_one_hw = %b", tmp, one_sw, one_hw);
end
else begin
$display("%b, leading_one_sw = %b, leading_one_hw = %b", tmp, one_sw, one_hw);
end
// update the tmp after a short delay to avoid the mismatch of output and input
#5 tmp = tmp + 1'b1;
end
end
endmodule
```
### DUT (IP)
- Export RTL IP and Create Block Design in Vivado

#### Utilization

#### Waveform

### DUT (Verilog)
#### Utilization

#### Waveform

----------------------------------------------------
# 3. Integer Division&Modulus
## Top Function (Kernel)
### In HLS
``` cpp=
const int n = 1234101;
int divbyconstant(int a) {
#pragma HLS INLINE off
return a / n;
}
void integer_division_modulus(int a, int &r) {
#pragma HLS INTERFACE ap_ctrl_none port=return
#pragma HLS INTERFACE ap_none port=a
#pragma HLS INTERFACE ap_none port=r
r = a - n * divbyconstant(a);
}
```
- It generates the remainder of a.
#### Utilization in HLS

- The operation is still combinational since it only uses DSP and LUT.
- There are 17 FFs for registers.
### In Verilog
``` verilog=
`define N 32'd1234101
module integer_division_modulus(
input wire [31:0] a_in,
output reg [31:0] r_out
);
reg [31:0] tmp;
function [31:0] divbyconst;
input [31:0] a_in;
begin
divbyconst = a_in / `N;
end
endfunction
always @* begin
tmp = divbyconst (a_in);
r_out = a_in - (`N * tmp);
end
endmodule
```
## Testbench
- Testbench in Verilog
``` verilog=
`define N 32'd1234101
module VIC_TESTBENCH();
reg clk; // 10ns clk;
reg rst; // active high, synchronous reset
reg [31:0] a_in; // input of kernel and golden
wire [31:0] r_hw; // remainder of Kernel
reg [31:0] r_sw; // remainder of Golden
reg [0:0] error_detect; // detect the error, if 1, error
integer i;
// design under test
design_1_wrapper DUT (.ap_clk(clk),
.ap_rst(rst),
.a(a_in),
.r(r_hw));
// integer_division_modulus (.a_in(a_in),
// .r_out(r_hw));
// 10ns clk generator
initial begin
clk = 1'b1;
forever begin
#5 clk = ~clk;
end
end
// divide by constant
task divbyconst;
input [31:0] a_in;
output [31:0] d_out;
begin
d_out = a_in / `N;
end
endtask
// golden integer_division_modulus
task golden_integer_division_modulus;
input [31:0] a_in; // the 32-bit input
output [31:0] r_out; // the 32-bit output remainder
reg [31:0] tmp; // temperate value which will be the output of divbyconst
begin
tmp = 32'd0; // initialize the tmp
divbyconst(a_in, tmp);
r_out = a_in - (`N * tmp);
end
endtask
// main function of the testbench
initial begin
// initialize the value of reset, input, error_detect
rst = 1'b1;
a_in = `N;
error_detect = 1'b0;
// rst fall after 50ns, since the first 40ns of kernel don't have any output
#50 @(posedge clk) rst = 1'b0;
// Compare the value of kernel and golden
for (i = 0; i < 3000; i = i + 1) begin
@(posedge clk) golden_integer_division_modulus(a_in, r_sw);
#5
if (r_sw != r_hw) begin
$display("Error at %d, r_sw = %d, r_hw = %d", a_in, r_sw, r_hw);
error_detect = 1'b1;
end
else begin
/* $display("%d, r_sw = %d, r_hw = %d", a_in, r_sw, r_hw); */
end
#4
a_in = a_in + 1'b1;
end
// Check the error
if (error_detect != 1'b0) begin
$display("Failed!!!");
end
else begin
$display("Test Passed!!");
end
end
endmodule
```
### DUT (IP)
- Export RTL IP and create block design in Vivado

#### Utilization

- 5 FFs for registers
#### Waveform

### DUT (Verilog)
#### Utilization

- It uses only LUT in Verilog design, not like HLS design contains FFs for registers.
#### Waveform

- We can know that the `r_hw` of Verilog design output is at the same time as the input `a_in`, since it is purely combinational logic with LUT.
--------------------------------------------------------
# 4. Binary-to-BCD (Division)
## Top Function (Kernel)
### In HLS
``` cpp=
void get_digit(uint14 &a, uint4 &digit) {
digit = a - 10 * (a / 10);
a = a / 10;
}
void binary2bcd_div(uint14 in_binary, uint16 *packed_bcd) {
#pragma HLS INTERFACE ap_none port=in_binary
#pragma HLS INTERFACE ap_none port=packed_bcd
#pragma HLS INTERFACE ap_ctrl_none port=return
uint14 a = in_binary;
uint4 digit_0;
uint4 digit_1;
uint4 digit_2;
uint4 digit_3;
get_digit(a, digit_0);
get_digit(a, digit_1);
get_digit(a, digit_2);
get_digit(a, digit_3);
*packed_bcd = (digit_3, digit_2, digit_1, digit_0);
}
```
### In Verilog
``` verilog=
module binary2bcd_div(
input wire [13:0] in_binary,
output reg [15:0] packed_bcd
);
reg [13:0] a;
reg [3:0] digit_0;
reg [3:0] digit_1;
reg [3:0] digit_2;
reg [3:0] digit_3;
function [3:0] get_digit;
input [13:0] a;
begin
get_digit = a - 10 * (a / 10);
end
endfunction
always @* begin
a = in_binary;
digit_0 = get_digit(a);
a = a / 10;
digit_1 = get_digit(a);
a = a / 10;
digit_2 = get_digit(a);
a = a / 10;
digit_3 = get_digit(a);
a = a / 10;
packed_bcd = {digit_3, digit_2, digit_1, digit_0};
end
endmodule
```
## Testbench
- Testbench in Verilog
``` verilog=
module VIC_TESTBENCH();
reg clk;
reg rst;
reg [13:0] in_binary;
wire [15:0] packed_bcd_hw;
reg [15:0] packed_bcd_sw;
reg [0:0] error_detect;
integer i;
// design under test
design_1_wrapper DUT0 (.ap_clk(clk),
.ap_rst(rst),
.in_binary(in_binary),
.packed_bcd(packed_bcd_hw));
/*
binary2bcd_div DUT1 (.in_binary(in_binary),
.packed_bcd(packed_bcd_hw));
*/
// 10ns clk generator
initial begin
clk = 1'b1;
forever begin
#5 clk = ~clk;
end
end
// Golden get_digit
task get_digit;
input [13:0] a;
output [3:0] digit;
begin
digit = a - 10 * (a / 10);
end
endtask
// golden
task binary2bcd_sw;
input [13:0] in_binary;
output [15:0] packed_bcd;
reg [13:0] a;
reg [3:0] digit_0;
reg [3:0] digit_1;
reg [3:0] digit_2;
reg [3:0] digit_3;
begin
a = in_binary;
get_digit (a, digit_0);
a = a / 10;
get_digit (a, digit_1);
a = a / 10;
get_digit (a, digit_2);
a = a / 10;
get_digit (a, digit_3);
a = a / 10;
packed_bcd = {digit_3, digit_2, digit_1, digit_0};
end
endtask
//main function of the testbench
initial begin
rst = 1'b1;
in_binary = 14'd0;
error_detect = 1'b0;
#30 @(posedge clk) rst = 1'b0;
for (i = 0; i < 9999; i = i + 1) begin
@(posedge clk)
binary2bcd_sw (in_binary, packed_bcd_sw);
#80
if (packed_bcd_sw != packed_bcd_hw) begin
$display("Error at %d, bcd_sw = %b, bcd_hw = %b", in_binary, packed_bcd_sw, packed_bcd_hw);
error_detect = 1'b1;
end
#0 in_binary = in_binary + 1'b1;
end
#5
if (error_detect == 1'b0) begin
$display("Test Passed");
end
else begin
$display("Error!!!");
end
end
endmodule
```
### DUT (IP)
- Export RTL IP and create a block design

#### Utilization

- There are 9 FFs for registers.
#### Waveform
1. This is the waveform of IP if I change the `in_binary` every 50ns, which clock period is 10ns.

- We can see that the longest period appears every 4 `packed_bcd_hw`, which is longer than others(40ns longer).
2. Then, I change the `in_binary` every 90ns.

- We can see that `packed_bcd_hw` and `packed_bcd_sw` match each other in every 4 times we change the `in_binary`.
3. Both 1, and 2 are correct.
- 
### DUT (Verilog)
#### Utilization

- Verilog design only uses LUTs, unlike the IP using FFs for registers.
#### Waveform


---------------------------------------------------
# Binary-to-BCD (Double Dabble)
## Top Function (Kernel)
### In HLS
``` cpp=
void double_dabble(uint16 &scratch_pad) {
#pragma HLS INLINE
scratch_pad = scratch_pad << 1;
if (scratch_pad(11, 8) > 4)
scratch_pad(11, 8) = scratch_pad(11, 8) + 3;
}
void binary2bcd_double_dabble(uint8 in_binary, uint16 *unpacked_bcd, uint8 *packed_bcd) {
#pragma HLS INTERFACE ap_none port=in_binary
#pragma HLS INTERFACE ap_none port=unpacked_bcd
#pragma HLS INTERFACE ap_none port=packed_bcd
#pragma HLS INTERFACE ap_ctrl_none port=return
uint16 scratch_pad = in_binary;
uint4 zero_4 = 0b0000;
double_dabble(scratch_pad);
double_dabble(scratch_pad);
double_dabble(scratch_pad);
double_dabble(scratch_pad);
double_dabble(scratch_pad);
double_dabble(scratch_pad);
double_dabble(scratch_pad);
scratch_pad = scratch_pad << 1;
*packed_bcd = scratch_pad(15, 8);
*unpacked_bcd = (zero_4, scratch_pad(15, 12), zero_4, scratch_pad(11, 8));
}
```
### In Verilog
``` verilog=
module binary2bcd_double_dabble(
input wire [7:0] in_binary, // binary input
output reg [15:0] unpacked_bcd, // 16-bit bcd output
output reg [7:0] packed_bcd // 8-bit bcd output
);
reg [15:0] num_list;
reg [16:0] tmp_list;
reg [7:0] bcd;
integer i;
always @* begin
num_list = {8'd0, in_binary};
// shift and compare 7 times
for (i = 0; i < 7; i = i + 1) begin
tmp_list = num_list * 2;
bcd = tmp_list[15:8];
// whether > 4?
if (bcd[3:0] > 3'd4) begin
bcd = bcd + 2'd3;
num_list = {bcd, tmp_list[7:0]};
end
else begin
num_list = {bcd, tmp_list[7:0]};
end
end
// In the last time, only need to shift.
tmp_list = num_list * 2;
bcd = tmp_list[15:8];
num_list = {bcd, tmp_list[7:0]};
// bcd output
packed_bcd = num_list[15:8];
unpacked_bcd = {4'd0, num_list[15:12], 4'd0, num_list[11:8]};
end
endmodule
```
## Testbench
- Testbench in Verilog
``` verilog=
module VIC_TESTBENCH();
reg [7:0] in_binary; // binary input
wire [15:0] up_bcd_hw; // unpacked bcd number
wire [7:0] p_bcd_hw; // packed bcd number
reg [15:0] up_bcd_sw; // software
reg [7:0] p_bcd_sw; // software
reg [0:0] error // error detect
// design under test
design_1_wrapper DUT0 (.in_binary(in_binary),
.unpacked_bcd(up_bcd_hw),
.packed_bcd(p_bcd_hw));
/*
binary2bcd_double_dabble DUT1 (.in_binary(in_binary),
.unpacked_bcd(up_bcd_hw),
.packed_bcd(p_bcd_hw));
*/
// double dabble in software
task double_dabble;
inout [15:0] scratch_pad;
begin
scratch_pad = scratch_pad * 2;
if (scratch_pad[11:8] > 3'd4) begin
scratch_pad[11:8] = scratch_pad[11:8] + 2'd3;
end
end
endtask
// golden
task binary2bcd_double_dabble;
input [7:0] in_binary;
output [15:0] unpacked_bcd;
output [7:0] packed_bcd;
reg [15:0] scratch_pad;
integer i;
begin
scratch_pad = {8'd0, in_binary};
for (i = 0; i < 7; i = i + 1) begin
double_dabble (scratch_pad);
end
scratch_pad = scratch_pad * 2;
packed_bcd = scratch_pad[15:8];
unpacked_bcd = {4'd0, scratch_pad[15:12], 4'd0, scratch_pad[11:8]};
end
// main function of the testbench
initial begin
// initialize the value
in_binary = 8'd0;
error = 1'b0;
for (i = 0; i < 99; i = i + 1) begin
binary2bcd_double_dabble (in_binary, up_bcd_sw, p_bcd_sw);
#5
if (up_bcd_sw != up_bcd_hw) begin
$display("Error at %d, bcd_sw = %b, bcd_hw = %b", in_binary, p_bcd_sw, p_bcd_hw);
end
#5
in_binary = in_binary + 1'b1;
end
if (error == 1'b1) begin
$display("Error!!");
end
else begin
$display("Test Passed!!!");
end
end
endmodule
```
### DUT (IP)
- Export RTL IP and create a block design

#### Utilization

- Design only contains 5 LUTs.
#### Waveform


### DUT (Verilog)
#### Utilization

- Design contains 9 LUTs, more than IP.
#### Waveform

