owned this note
owned this note
Published
Linked with GitHub
# CVSD Final
* Complex Multiplication

* 內差法: square root / square
* Q可以被H複寫 (但R要多開)
* test
* https://core.ac.uk/download/pdf/79572487.pdf
## Overview



### Systolic array with scheduling

### approximate $1/\sqrt{x}$ with pairwise-linear approximation
* unsigned 4.16 --> sqrt --> signed 3.16
* unsigned 4.16 --> sqrt_inv --> signed 9.10
matlab simulation with 4.16-3.17 sqrt for sqrt and 1/sqrt
* 0.97788%
* 1.2372%
* 0.95965%
* 0.65706%
* 0.77134%
* 0.84888%
matlab simulation with 4.16-3.17 sqrt and 4.16-9.11 invsqrt
* 0.98197%
* 1.222%
* 0.96625%
* 0.665%
* 0.78001%
* 0.84914%
* https://www.shironekolabs.com/posts/efficient-approximate-square-roots-and-division-in-verilog/?fbclid=IwAR1NVgOFTZ1HML9NAxwSL2QynBOzAb9ichpPTaTbGc0jxap9cOg_GqkjnUY
Global e upper/lower bound
imeMax: 0.980103376442, imeMin: -0.959731767374, reeMax: 0.964647184301, reeMin: -0.960619162828
Global h2(0) h3(0) h4(0) h3(1) h4(1) h4(2) upper/lower bound
imhMax: 1.53452587128, imhMin: -1.52857851982, rehMax: 1.64116168022, rehMin: -1.55049991608
Global h1(0) h2(1) h3(2) h4(3) upper/lower bound
imhMax: 1.56784057617, imhMin: -1.36545133591, rehMax: 1.61977244559, rehMin: -1.40258966081
iter 0 packet 1
imQ11: -0.142001119908
reQ11: 0.562485393835
imQ21: -0.307453532354
reQ21: 0.450537929893
imQ31: -0.0844166771276
reQ31: 0.119031255133
imQ41: 0.418220755411
reQ41: 0.404726442066
APR 一開始utilization rate高一點
RTL:
Packet: 1.1195, 1.2325, 1.1312, 0.69547, 0.84894, 0.90427, (%)
Extra Packet:
stage2: trucate 4 bit in adding
Packet: 1.1068, 1.2367, 1.135, 0.6938, 0.84331, 0.90278
Extra Packet:
12/12 10Retime:
Packet
8 bit: Fail (>2%)
12/13 Performace:
Area: 682473 (um^2)
Time: 119649 (ns)
Power: 76 (mW)
```try_script.tcl```:
```
## PrimeTime Script
set power_enable_analysis TRUE
set power_analysis_mode time_based
set power_clock_network_include_register_clock_pin_power false
set CYCLE 5.0
read_file -format verilog ../02_SYN/Netlist/QR_Engine_syn.v
# read_file -format verilog ../04_APR/QR_Engine_pr.v
current_design QR_Engine
link
# ===== modified to your max clock freq ===== #
create_clock -period $CYCLE [get_ports i_clk]
set_propagated_clock [get_clock i_clk]
# ===== active window ===== #
# read_fsdb -strip_path testfixture/u_dut ../05_POST/testfixture.fsdb
read_fsdb -strip_path testfixture/u_dut ../03_GATE/testfixture.fsdb
update_power
report_power
report_power -verbose > try_active.power
# exit
```
12/13 陳 syn cycle:4.7 slack:-0.06 before hy_0_r hy_1_r clock gate
set high_fanout_net_threshold 0
compile_ultra
optimize_registers
(no -gate_clock)
Area: 715501 (um^2)
Time: 119649 (ns)
Power: 84.3 (mW)
12/13 陳 syn cycle:4.8 slack:MET after hy_0_r hy_1_r clock gate
set high_fanout_net_threshold 0
compile_ultra
optimize_registers
(no -gate_clock)
$setuphold timing violation
Area: 705668 (um^2)
Time: 119649 (ns)
Power: --- (mW)
---

12/13 陳 syn cycle:4.7 slack:-0.14 after hy_0_r hy_1_r clock gate
set high_fanout_net_threshold 0
compile_ultra
optimize_registers
(no -gate_clock)
$setuphold timing violation at 27500??
Area: 705668 (um^2)
Time: 119649 (ns)
Power: --- (mW)

can't run power analysis
-------------
12/14: Truncate redundant LSB to reduce area / timing
Worse case: 1.78% (packet 12, SNR 10dB)
Jerry cycle=4.6
```compile_ultra -gate_clock```
Timing: MET, Gate simulation: No violations (貌似沒有問題)
563433.257061(um^2) * 100196(ns) * 49.8(mW) = 1.8255826e+12
stage 8: trunc 2 bit
1.6718% 1.4323%
12/15 13:00
-max_fanout 10 \
set_optimize_registers
compile_ultra -retime -gate_clock
set CLOCK_PERIOD 4.3
power: 55.3
testcycle: 5.0
area: 605017
power: 56.4 -> 45
Current best main.tcl for synthesis:
```
set hdlin_translate_off_skip_text "TRUE"
set edifout_netlist_only "TRUE"
set verilogout_no_tri true
set plot_command {lpr -Plw}
set hdlin_auto_save_templates "TRUE"
set compile_fix_multiple_port_nets "TRUE"
set DESIGN "QR_Engine"
set CLOCK "i_clk"
set CLOCK_PERIOD 4.7
set high_fanout_net_threshold 0
sh rm -rf Netlist
sh rm -rf Report
sh mkdir Netlist
sh mkdir Report
read_file -format verilog ./filelist.v
current_design $DESIGN
link
create_clock $CLOCK -period $CLOCK_PERIOD
set_ideal_network -no_propagate $CLOCK
set_dont_touch_network [get_ports $CLOCK]
# ========== Do not modified block ================= #
set_clock_uncertainty 0.1 $CLOCK
set_input_delay 1.0 -clock $CLOCK [remove_from_collection [all_inputs] [get_ports $CLOCK]]
set_output_delay 1.0 -clock $CLOCK [all_outputs]
set_drive 1 [all_inputs]
set_load 0.05 [all_outputs]
set_max_fanout 8 [current_design]
set_operating_conditions -max_library slow -max slow
set_wire_load_model -name tsmc13_wl10 -library slow
# =================================================== #
check_design
uniquify
set_clock_gating_style \
-max_fanout 10 \
-pos integrated \
-control_point before \
-control_signal scan_enable
# clock gating problem
# set_clock_gating_check -setup 0 -hold 0 [get_cells *]
set_fix_multiple_port_nets -all -buffer_constants [get_designs *]
set_fix_hold [all_clocks]
# set_optimize_registers
# compile_ultra -retime
# replace_clock_gates
compile_ultra -gate_clock
# compile_ultra
report_area > Report/$DESIGN\.area
# report_area -hierarchy > Report/$DESIGN\.area_hier
report_power > Report/$DESIGN\.power
report_timing -max_path 100 -delay_type max > Report/$DESIGN\.max.timing
report_timing -max_path 100 -delay_type min > Report/$DESIGN\.min.timing
report_clock_gating -gating_elements
report_clock_gating -ungated
set bus_inference_style "%s\[%d\]"
set bus_naming_style "%s\[%d\]"
set hdlout_internal_busses true
change_names -hierarchy -rule verilog
define_name_rules name_rule -allowed "a-z A-Z 0-9 _" -max_length 255 -type cell
define_name_rules name_rule -allowed "a-z A-Z 0-9 _[]" -max_length 255 -type net
define_name_rules name_rule -map {{"\\*cell\\*" "cell"}}
define_name_rules name_rule -case_insensitive
write -format verilog -hierarchy -output Netlist/$DESIGN\_syn.v
write_sdf -version 2.1 -context verilog Netlist/$DESIGN\_syn.sdf
write_sdc Netlist/$DESIGN\_syn.sdc
```