**Lab 4 report**
組別:2
組員: 劉祐瑋、陳昇達、劉佩雯
Lab: 4
HackMD link: https://hackmd.io/YiMD_LI1Q9elehhYnK5m0A?view (歡迎利用此閱讀)
Github link: https://github.com/nthuyouwei/asoclab/tree/main/lab04
首先在本報告中並不會特別解釋4-1、4-2的部分,原因一:影片 https://www.youtube.com/watch?v=xC7MUxudTwg 已有詳細的說明;原因二:在4-3我們需要把FIR合進去soc side uspj,我們需要進行Simulation 以及Validation這兩個也分別對應到4-1與4-2內容,故不會特別講解4-1及4-2。
目錄:
[TOC]
# Simulation:
## Introduction
我們需要把FIR放進去Caravel-FSIC中的USERPRJ_SUBSYS。如下圖位置:

Put FSIC into Caravel SoC user project area

然而,從io_serdes接出來mprj pin會與FPGA side的io_serdes對接,如下圖:

最後,整個Block diagram如下圖:
示意圖:

The block design in Vivado(這裡的ps_axil_0是對應到FPGA-FSIC):

### module introduction:
這邊介紹幾個特別的module:
- Axi Verification IP (VIP)
Axi Verification core is developed to support the simulation of customer designed AXI-based IP。這是用來模擬axi interface的傳遞與接收,到時候Validation的時候就不需要這部分,而是會直接接上ZYNQ processing system,由pythone來控制。
- axi_vip_0:
- PS master cycle generation
- Config module in the caravel and ps_axi
- axi_vip_1:
- Memory slave module
- Receive data from LA in caravel side
- Flesh to the memory slave module from ps_axi
- axi_vip_2、3:
- To be the user DMA target
- one is read data pattern, the other one is to write data pattern
- User DMA
User DMA會直接從mem讀取我們所需要的資料,並且轉成axi-stream傳入FPGA-FSIC(Downstream),同時也可以從FPGA-FSIC傳出axi-stream給DMA,DMA會把data寫回mem(Upsream)。

- User DMA configuration register:

- User DMA data flow:

- 1. getinstream
This function reads packets from the input stream in_stream, processes them, and outputs the results to out_stream and out_counts. It checks for errors such as packet sequence and integrity. If the conditions are not met, it sets s2m_err to indicate an error. Additionally, it tracks the total data length processed and pushes the current count to out_counts when the maximum output length is reached.

- 2. streamtoparallelwithburst
This function reads data from the input stream in_stream and writes it to out_memory based on the number specified in in_counts. It uses a loop to handle multiple data blocks and stops when the preset total length in_s2m_len or when the buffer is full. This function also manages the status of the stream, such as resetting or setting output statuses.

注意:我們需要寫改**BUF_LEN**為我們所需要的長度才不會造成DEADLOCK,這部分應該改寫成利用configure register設定,而不是固定的長度。
- 3. paralleltostreamwithburst
This function converts parallel data from a memory array back into a stream format suitable for sequential transmission. It handles specific data lengths and widths, ensuring data continuity and consistency.

- 4. sendoutstream
This function forwards processed data from one output stream to another. It primarily manages the forwarding of processed data to the next stage or output port while also managing the stream's status and error checking.

## Testbench
Testbench code flow - initial

FIR_Test():

the [corresponding code](https://github.com/nthuyouwei/asoclab/blob/main/lab04/vivado/fsic_tb.sv) is on the github.
## Result
We can compare the [updma_output_fir.log](https://github.com/nthuyouwei/asoclab/blob/main/lab04/vivado/updma_output_fir.log) and [fir_golden.txt](https://github.com/nthuyouwei/asoclab/blob/main/lab04/vivado/fir_golden.txt)

除此之外,我們還可以在tb.sv中加入dumpfile 來產生.vcd觀看波型:
我們可以觀察到因為user dma的關係,sm_tready 和ss_tvalid一直都為1:

當然我們也還可以觀察其他我們想觀察的地方,這也是先做simulation的好處,可以找到是哪一層moudule出錯,像是一開始時fir 會連續接收到兩筆相同的資料,就是可以從vcd中發現io_serdes這個module double sampling。
# Validation
simulation 完後,我們需要把desing上板子做validation。
## Introduction
不同simulation 使用vip,這裡我們需要連接上ZYNQ processing system這樣PS side才能利用python code 來program。同時要注意原本的tcl file並沒有設定userdma,我們需要手動連接。
- 示意圖

- The block desine in Vivado

- Base Address:

注意:如果沒有特別設定的話,這裡的base_address有可能跟simulation的不一樣。
### module introduction:
這邊介紹其中特別的module:
- AXI_QUAD_SPI

The AXI Quad SPI (Serial Peripheral Interface) is a peripheral used in FPGA (Field Programmable Gate Array) and SoC (System on Chip) designs, providing a communication interface between the host (like an FPGA or a processor within an SoC) and external devices like memory chips, sensors, and other peripherals. It is based on the widely used SPI protocol, but extends it to use four lines for data transfer, enhancing the throughput compared to traditional single-line SPI.
Key Features of AXI Quad SPI:
- Four Data Lines: Unlike traditional SPI, which uses a single data line for each direction (MOSI and MISO), Quad SPI uses four lines that can be used for bidirectional data transfer, significantly increasing the data transfer rate.
- AXI Interface: It utilizes the AXI (Advanced eXtensible Interface) protocol, which is part of the ARM AMBA (Advanced Microcontroller Bus Architecture) specification. This makes it highly efficient and compatible with a wide range of ARM processors and other architectures that support AXI.
- Enhanced Throughput: The use of multiple lines allows for higher data throughput. In many implementations, Quad SPI can operate at double or even quadruple the rate of standard SPI, making it ideal for applications requiring fast data transfers, such as loading code from serial flash memories.
- Flexibility: The AXI Quad SPI can be configured in different modes according to the requirements of the target device, including standard SPI, dual SPI, or quad SPI modes. This flexibility allows designers to balance pin count and throughput needs.
- Read and Write Operations: It supports both read and write operations, enabling not only data fetching from external flash memories but also data logging and configuration settings to be written back to peripheral devices.
- Standard and Memory Mapped Modes: Typically, the AXI Quad SPI can be operated in two modes: a standard mode where each transaction is initiated by the processor, and a memory-mapped mode which allows the SPI-connected memory to be mapped directly into the processor’s address space, simplifying software design.
Use Case in this lab:
- It will enable pass through mode for PS side to write firmware code pass through FPGA-FSIC and caravel to the spiflash.
## testbench -python
code : https://github.com/nthuyouwei/asoclab/blob/main/lab04/jupyter_notebook/fir_test_final.ipynb
首先第一部分是參考4-2做reset soc and FPGA side,除此之外還有load firmware code and write each module configuration。

最後跟fir_test()一樣設定DMA跟FIR:這裡我們是用mmio這個library來program。


注意:這裡userdma的Base Address是0x40020000

注意:我們要先利用allocate 這個函數來設定,才能提供physical address給user dma的input 跟output的memory address來使用。




注意:當dma done時0x10會為1,反之為0

## Result
我們可以print buf_out來看fir輸出是否有從caravel side io serdes 傳到fpga side io serdes 然後經由user dma寫入memory。

結果發現有寫入memory,並且寫入的值也符合golden data。
# 補充
就如同lab1-fsic-sim一樣,我們寫入fir length和coef可以從fpga side或是soc side寫入。在本實驗一開始我是利用testbench也就是模擬從fpga side寫入,但其實我們可以利用改寫firmware code來寫入,結合上學期lab4-2所學,我們可以來寫firmware code讓cpu發出wb cycle並且寫入configuare register,底下來演示一遍,
在fsic.c加入兩部分:


利用./run_fw指令來產生新的.hex和.coe檔
修改simulation的tb (comment write cycle):

Result:
從下圖可以發現它也有正確的寫入coef:

最後我們可以比較兩者的速度可以發現,當今天利用firmware code寫coef會比從fpga side寫入coef來的更久。因為從CPU發出指令經由一連串的transaction到wb會比直接從fpga side 打出write cycle來的慢很多:
