# 2024_Fall_NTU_AAHLS * SoC Design Study Journal Name : 陳瀚坪 Student ID : r12943168 user passwd: guxlss ## Lab 1 HLS Multiplier * Lab1 HLS 乘法器 (AXI-Lite) * FPGA Board : PYNQ-Z2 (xc7z020clg400-1) * FPGA P & R :![Multiplier_P&R](https://hackmd.io/_uploads/ByB-ztfAC.png) * FPGA Layout:![Multiplier_Implemention](https://hackmd.io/_uploads/S1yMfFfAR.png) * FPGA Result:![Multiplier_result](https://hackmd.io/_uploads/r1UOmYMR0.png) * 注意事項 * 1 #include "multiplication.h" 改成 #include "Multiplication.h" (Ubuntu/Linux is case sensitivity)(大小寫要完全符合) * 2 #pragma HLS INTERFACE ap_ctrl_none port=return in Multiplication.cpp在Co-Sim時要註解,但之後要解除註解並重新Synthesis,才能輸出成IP * 3 Vivado: Create Block Design: double click Create Block Design -> add ZYNQ7 -> add multiplier multip_2num -> click regenerate layout -> click Run Block Automation -> double click ZYNQ7 block -> click Clock Configuration -> change to 100Mhz (due to your clock period is 10ns) -> click Run Connection Automation (記得要改clock period) ## Lab 2 HLS FIR * Lab2 HLS 11-tap FIR (AXI-Master & AXI Stream) * FPGA Board: KV260 (Kria KV260 Vision AI Starter Kit) * Lab2-1 AXI-Master * FPGA P & R :![FIRN11_MAXI_P&R](https://hackmd.io/_uploads/H1OzwKf0A.png) * FPGA Layout:![FIRN11_MAXI_layout1](https://hackmd.io/_uploads/B1nso_zR0.png) * FPGA Result:![FIRN11_MAXI_result1](https://hackmd.io/_uploads/H1bcodMCR.png) * Lab2-2 AXI-Stream * FPGA P & R :![FIRN11_Stream_P&R](https://hackmd.io/_uploads/HybXwFGRR.png) * FPGA Layout:![FIRN11_Stream_layout1](https://hackmd.io/_uploads/ryzenOGRR.png) * FPGA Result:![FIRN11_Stream_result](https://hackmd.io/_uploads/rJOghdG0R.png) * 注意事項 * 1 Change Windows fc command to Ubuntu diff command在FIRTester程式中 * 2 FPGA Board xc7z020clg400-1 改成 xck26-sfvc784-2LV-c * 3 Jupyter Notebook 的 KV260 檔案位置為 ol = Overlay("/home/root/jupyter_notebook/FIRN11Stream.bit") * 4 在vivado中記得改Zynq UltraScale+ MPSoC的HP port 設定 ## Lab 3 Verilog FIR * Lab3 Verilog 11-tap FIR (AXI-Master & AXI Stream) ![image](https://hackmd.io/_uploads/S1yUggm7kg.png) ![image](https://hackmd.io/_uploads/rkF9glmXkg.png) ![image](https://hackmd.io/_uploads/H10Uyem7Jx.png) ![image](https://hackmd.io/_uploads/B1-EkgXQkg.png) ![image](https://hackmd.io/_uploads/Hyij1l77ye.png) ![image](https://hackmd.io/_uploads/Bkb3kgm71e.png) ![image](https://hackmd.io/_uploads/HkIhkxQQJx.png) ![image](https://hackmd.io/_uploads/Bk3hJx7X1e.png) ![image](https://hackmd.io/_uploads/rklaJgXQJe.png) ![image](https://hackmd.io/_uploads/SyPaJeQXkl.png) ![image](https://hackmd.io/_uploads/Hkp6JeXm1l.png) ![image](https://hackmd.io/_uploads/HkZCJxQXke.png) ![image](https://hackmd.io/_uploads/r1HRkg7X1g.png) ![image](https://hackmd.io/_uploads/Hkt0kgmXJx.png) * 注意事項 * 1 AXI-BUS中的控制訊號(valid和ready)都是獨立 * 2 data-in和data-out最好能latch起來,增加margin的時間,讓software能更充裕地傳資料和收資料 * 3 design的正確與否要從design本身來驗證,而不是單從testbench來驗證 ## Lab 4 Caravel-FIR Hardware-Software Codesign * Lab4 Caravel-FIR Hardware-Software Codesign * Caravel RISC-V ![image](https://hackmd.io/_uploads/BkH13taSkx.png) ![image](https://hackmd.io/_uploads/ryok2FTSkl.png) ![image](https://hackmd.io/_uploads/S1Qe2tpHye.png) ![image](https://hackmd.io/_uploads/r13e2FprJg.png) ![image](https://hackmd.io/_uploads/HyVZ2Y6rJe.png) ![image](https://hackmd.io/_uploads/Sk_bnYTBkg.png) ![image](https://hackmd.io/_uploads/BJCbhFTH1g.png) ![image](https://hackmd.io/_uploads/HkXz3tTH1e.png) ![image](https://hackmd.io/_uploads/rkYz3tTBke.png) ![image](https://hackmd.io/_uploads/BypM3tTSkg.png) ![image](https://hackmd.io/_uploads/r1VQ2FpByl.png) ![image](https://hackmd.io/_uploads/HyO73FpBkx.png) ![image](https://hackmd.io/_uploads/SJTm2tpr1x.png) ![image](https://hackmd.io/_uploads/BkzV3KTBye.png) ![image](https://hackmd.io/_uploads/HyL4htpHyg.png) ![image](https://hackmd.io/_uploads/rk5EhKpSyl.png) ![image](https://hackmd.io/_uploads/HJyBhKpBkg.png) ![image](https://hackmd.io/_uploads/rkMB3K6ryg.png) ![image](https://hackmd.io/_uploads/SkLS3tTrke.png) * 注意事項 * 1 尚未優化前存在read after write hazard的問題,因此需要改變Instruction的順序,並且加入-O2進一步優化compiler的編譯 * 2 Input和Output Latch的重要性在Lab4中明顯地展現出來 ## Lab 5 Caravel SoC FPGA Integration * Lab5 Caravel SoC FPGA Integration * FPGA Board: PYZQ-Z2 ![image](https://hackmd.io/_uploads/rJA56Y6Hke.png) * a. counter_wb ![image](https://hackmd.io/_uploads/Bk0DpYpH1g.png) * b. counter_la ![image](https://hackmd.io/_uploads/HJfZaKaHkl.png) * c. gcd_la ![image](https://hackmd.io/_uploads/HyYW0YTBkl.png) * 注意事項 * 1 更改caravel_fpga.ipynb中的caravel_fpga.bit的路徑位置 ![image](https://hackmd.io/_uploads/rJxiRKTH1e.png) ## Final WLOS Optimization * Final WLOS Optimization (SDRAM、DMA、FIR、Matmul、Quick sort) * FPGA Board: PYZQ-Z2 ![image](https://hackmd.io/_uploads/rkJSKqTHJg.png) ![image](https://hackmd.io/_uploads/rkIHY56HJl.png) ![image](https://hackmd.io/_uploads/rycHY9aB1g.png) ![image](https://hackmd.io/_uploads/rJJLY96Byx.png) ![image](https://hackmd.io/_uploads/rkX8Y5aSyl.png) ![image](https://hackmd.io/_uploads/rkPIFcTB1l.png) ![image](https://hackmd.io/_uploads/H1iIKcpSyl.png) ![image](https://hackmd.io/_uploads/Sy1PYqTHJx.png) ![image](https://hackmd.io/_uploads/B1rPFqpBkx.png) ![image](https://hackmd.io/_uploads/HJ_vYcpSJl.png) ![image](https://hackmd.io/_uploads/HJbOF5aSyl.png) ![image](https://hackmd.io/_uploads/rkTKt96B1l.png) ![image](https://hackmd.io/_uploads/SyZct5TBJl.png) ![image](https://hackmd.io/_uploads/HyrcK5aB1e.png) * 注意事項 * 1 SDRAM中的bank interleaving是第9、10bit * 2 在DMA的幫助下,CPU只需負責發送指令給DMA和接收最終答案,因此可以達成多Accelerator同步執行 * 3 Quick Sort演算法的硬體需參考paper做優化 ## Week2 上課問答 * Week2 上課問答 * Q:In Lab#1 (multiplication), how does the kernel get its two operands?![螢幕擷取畫面 2024-09-19 160156](https://hackmd.io/_uploads/rkIqqUKTC.jpg) ![螢幕擷取畫面 2024-09-19 160352](https://hackmd.io/_uploads/BkbWsUtp0.jpg) * A: #pragma HLS INTERFACE s_axilite port=n32In2 #pragma HLS INTERFACE s_axilite port=n32In1 使用AXI-Lite(32 bits)讓資料可以從SoC Cache傳送到FPGA ## Week3 上課問答 * Week3 上課問答 * Q:FSM Design for better synthesis timing - register out (p#11 - 14) * A: * 穩定性:寄存器輸出可以確保輸出信號在一個時鐘週期內保持穩定,減少glitch的發生。 時序優化:register out可以幫助優化時序,因為register可以將輸出信號的變化與clock synchronous,避免combinational logic delay的問題. **Page.11** ![image](https://hackmd.io/_uploads/rJitxJlAA.png) **Fig.Setup Slack** ![image](https://hackmd.io/_uploads/BynkrkeRR.png) **Page.12** * A:將Mealy logic分別複製到Module C和Module D,當分別對FSM、Module C和Module D的Logic Synthesis時,每個module都能夠有完整的Cycle,而不會像左圖一樣Module C和Module D無法擁有完整的Cycle ![image](https://hackmd.io/_uploads/SyBQSJx0C.png) **Page.13** ![image](https://hackmd.io/_uploads/BJu4rkxC0.png) **增加extra FF** ![image](https://hackmd.io/_uploads/S1YTDuG00.png) **Page.14 最終版** ![image](https://hackmd.io/_uploads/H1FrSJx0C.png) ## Week4 上課問答 * Week4 上課問答 (無) ## Week5 上課問答 * Week5 上課問答 * Q:Explain and compare SMT v.s. Multi-core? * A:Simultaneous Multithreading和Multicore * **SMT** ![image](https://hackmd.io/_uploads/HJejzVAyJg.png) * **Multicore** ![image](https://hackmd.io/_uploads/B19YQ401kl.png) * **Comparison** ![image](https://hackmd.io/_uploads/rkUkHDskke.png) ## Week6 上課問答 * Week6 上課問答 (無) ## Week7 上課問答 * Week7 上課問答 * Q:Explain the system operation for IO read/write in a cache system * A:![image](https://hackmd.io/_uploads/ByZyqy771g.png) * 優點:使用cache snoop系統,確保cache與system memory之間數據的一致性,避免數據錯誤或不一致 * 優點:透過cache hit機制,減少對system memory的存取次數,大幅降低延遲並提升整體運算效能 * 缺點:為維持cache與system memory的一致性,需要執行額外的操作(如invalidate或write-back),這增加了運算負擔 * 缺點:當處理程序讀取數據時,若發生cache miss,會導致更多的延遲,並降低整體效能 ## Week8 上課問答 * Week8 上課問答 * Q:Explain multi-level cache and snoop filter mechanism (p#83-85) * A:P.83 [Note:multi-level cache設計中,L1 快取提供最快速的資料存取,而 L3 作為最後一層快取共享用於cache coherence snoop filtering]![image](https://hackmd.io/_uploads/HkPjoyXXJl.png) P.84 [Note:包含式快取 (Inclusive Cache) 強制 L1 的資料必須存在於 L2,而排除式快取 (Exclusive Cache) 僅允許資料存在於 L1 或 L2 的其中一個]![image](https://hackmd.io/_uploads/Hyq2oJQXJl.png) <div style="background-color: #e6f7ff; padding: 10px; border-radius: 5px; border: 1px solid #91d5ff;"> P.85 [Note:Snoop Filter在最後一層cache執行,用於減少對較L1/L2 cache的snoop需求,提升一致性檢查效率] </div> <img src="https://hackmd.io/_uploads/rkB67gQ7yx.png" alt="image">