# Lab1
## Brief Introduction about the Overall System
This Lab is using Vitis_HLS and Vivado to transfer C/C++ code into RTL hardware code. Letting we can focus on the behavior/algorithm of the hardware instead of the RTL coding.
The connection between these two tools are,
* Vitis HLS: Synthesizes C++ algorithms into hardware IP.
* Vivado: Generates block design applications for interfacing with the synthesized IP.
In Vitis_HLS, we need to go through these three simulation to verify the C code and the Design that tranformed from the C code.
* C sim (c-simulation): Simulates the correctness of C++ algorithm.
* C synthesis: Synthesizes C++ code into RTL hardware language.
* Co-sim (C/RTL co-simulation): Compares C-simulation and RTL-simulation results.
## What is Observed & Learned
* Basic usage of the Vivado and Vitis_HLS
* .bit, .hwh, .ipynb's usage on board simulation
* Pragmas are used for directives, enabling constraints and optimizations.
# Lab2
## Brief Introduction about the Overall System
We need to implement different AXI protocal to do the FIR computation with hardware circuit
We used two types of AIX-protocal,
* M-AXI (FIR_N11_MAXI):
Directly accesses DDR using M-AXI protocol.
No need for DMA assistance.
* S-AXI (FIR_N11_Stream):
Uses AXI-Stream as the interface.
Requires DMA (adapter) assistance for external DDR read/write.
Offers a more modular approach by delegating address handling to other IPs.
## What is Observed & Learned
MAXI & Stream Interface differences:
Both have handshake mechanisms (valid & ready).
AXI-Stream lacks an address interface, focusing solely on data transmission and reception, it can only do data-transfer, doesn't care the address or the burst length whatsoever
M-AXI involves address-based read/write operations
While writing the Lab2 reports, we also learned these information.
* AXI-stream, AXI-FULL(MAXI, AXI-master-slave), axi-lite belong to AMBA AXI 4.0 protocols.
* AXI-Lite handles small data transfers with address and handshake.
* AXI-FULL (and AXI-Lite) supports burst transfers for large data after a single handshake.
* AXI-Stream lacks an address concept but includes handshake for data transmission.
# Lab3
github link: https://github.com/b3nsonchang/SoC_lab3_fir
This repositary includes the waveform, Design & Testbench ,synth report, and the Lab report.
## Circuit & Function SPEC
Implementation of finite impulse response with n = 11. With only 1 Multiplier and 1 Adder, and 2 bram with 11 entry each.
The circuit needs to compute 
## What is Observed & Learned
* Verilog coding needs to be careful with the aomount of operand, otherwise it'll fail the SPEC.
* AXI-Lite-protocal Decoding
# Lab4-1
## Brief Introduction about the Overall System
Using the firmware code **fir.c** to simulate shift RAM, multiplies it with corresponding tap parameters, and accumulates the result
Using **counter_la_fir.out** store the registers into data as an init step.
# Lab4-2
## Brief Introduction about the Overall System
Design a wish-bone portocal decoder to transfer wish-bone protocal into AXI-Lite porotacal, so the Design we made in Lab3 can do the same computation even if the system doesn't use the same data transfer protocal as the Desgin.
With this diagram,

we can draw a more detailed block diagrm,

There's other address spec like the AP signal address, X[n] & Y[n] address we need to take care of.
## What is Observed & Learned
* FIR Engine Theoretical Throughput versus Actual Throughput
For 64 output data, each requiring 11 multiplications and 10 additions, the total operations are 21 * 64 = 1344. The theoretical time can be calculated based on the cycles for AXI-lite and AXI-stream. The overall operation involves 12 cycles for each set of 5 data and 11 tap parameters, plus additional cycles for control signals.
The theoretical time is **1793** cycles, while the actual cycles are **32471**. The discrepancy is observed due to waiting for the wishbone interface to complete firmware operations before AXI-lite confirmation.
* Latency for Firmware to Feed Data
The firmware needs at least 8 cycles to send one Xn to the design (5 cycles for WB to AXI-lite read, and 3 cycles for WB to AXI stream input).
# Lab5
## Brief Introduction about the Overall System
There's 4 IP in this system,
* read_romcode
Purpose: Read the ROM code from system memory (DDR) using AXI master and store it in BRAM.
Operation: Iteratively reads ROM code from system memory and writes it to BRAM. Limits the read count if it exceeds the available data.
* spiflash
Purpose: Interfaces with SPI flash for reading ROM code.
Operation: Utilizes input (io0) as SPI flash input and output (io1) as SPI flash output. Reads ROM code from BRAM and outputs it accordingly.
* ResetControl
Purpose: Sends OUTPIN control signals using AXI-Lite. The ap_ctrl sends signals internally to control the RISCV CPU.
* caravel_ps
Purpose: Interconnect IP for the PS side. Manages the interface between MPRJ and PS, allowing data transfer between them.
This lab's tool usage is the same as the previous lab, so we'll skip the new learned part
# Lab6
## Brief Introduction about the Overall System

from the above diagram, we can draw a new block diagram for UART,

we didn't need to design the RTL.
## What is Observed & Learned
* Verification on board with ipynb modulation
Referring to the discussions on GitHub #175, the team employed a while loop to print checkbit and corresponding time for verification. Additionally, in Top.c, a while loop was inserted at each reg_mprj_datal output interval to slow down the firmware’s execution speed for easier confirmation in the Jupyter notebook.
# LabD
## Brief Introduction about the Overall

In this lab, we need to modify the controller to add a "prefetch" operation, this operation allows controller using register to prefetch some SDRAM data around the last access address.
So in the next time, the system require something near the previous address, the controller can directly hit the address and output the corresponding data.
## What is Observed & Learned
SDRM controller flow

if the Refresh flag is being required, the controller will ignore the upgoing request and do the PRECHARGE and REFRESH
Refresh operation needs at leat 12T to complete

# Final Project
We replace the firmware workload in Lab6, including matmul, fir, qs into hardware accelerator. In addtion, we add the SDRAM-prefetch scheme we design in LabD to make the instruction read even faster.

All simulation operate in 25ns/cycle
* Lab6's latency for each workload
mm latency = **73729**, fir latency = **162936**, qs latency = **29525**
```
Reading top.hex
top.hex loaded into memory
Memory 5 bytes = 0x6f 0x00 0x00 0x0b 0x13
LA mm 1 started
mm latency: 73729
Call function matmul() in User Project BRAM (mprjram, 0x38000000) return value passed, 0x003e
Call function matmul() in User Project BRAM (mprjram, 0x38000000) return value passed, 0x0044
Call function matmul() in User Project BRAM (mprjram, 0x38000000) return value passed, 0x004a
Call function matmul() in User Project BRAM (mprjram, 0x38000000) return value passed, 0x0050
LA mm 2 passed
LA fir 1 started
LA fir 2 passed
fir latency: 162936
LA qs 1 started
qs latency: 29525
Call function qsort() in User Project BRAM (mprjram, 0x38000000) return value passed, 0x0028
Received 40
Received 893
Received 2541
Received 2669
LA qs 2 passed
```
* After we replace the workload into hardware computation by using wishbone address assignment
mm latency = **3837**, fir latency = **3426**, qs latency = **4218**
```
Reading top.hex
top.hex loaded into memory
Memory 5 bytes = 0x6f 0x00 0x00 0x0b 0x13
VCD info: dumpfile top.vcd opened for output.
start time 1205513
LA qs 1 started
Received 40
Received 893
Received 2541
Received 2669
Received 3233
Received 4267
Received 4622
Received 5681
Received 6023
Received 9073
LA qs 2 passed
QS latency: 4218
start time 1396988
LA fir 1 started
Received: 0
Received: -10
Received: -29
Received: -25
Received: 35
Received: 158
Received: 337
Received: 539
Received: 732
Received: 915
Received: 1098
LA fir 2 passed
fir latency: 3436
start time 1595938
LA mm 1 started
Received: 62
Received: 68
Received: 74
Received: 80
Received: 62
Received: 68
Received: 74
Received: 80
Received: 62
Received: 68
Received: 74
Received: 80
Received: 62
Received: 68
Received: 74
Received: 80
LA mm 2 passed
MM latency: 3837
Congrats
1861813
```
* Finally, we add a prefetch scheme to optimize the reading latency
mm latency = **3000**, fir latency = **2804**, qs latency = **3730**
```
Reading top.hex
top.hex loaded into memory
Memory 5 bytes = 0x6f 0x00 0x00 0x0b 0x13
VCD info: dumpfile top.vcd opened for output.
start time 1180513
LA qs 1 started
Received 40
Received 893
Received 2541
Received 2669
Received 3233
Received 4267
Received 4622
Received 5681
Received 6023
Received 9073
LA qs 2 passed
QS latency: 3730
start time 1359788
LA fir 1 started
Received: 0
Received: -10
Received: -29
Received: -25
Received: 35
Received: 158
Received: 337
Received: 539
Received: 732
Received: 915
Received: 1098
LA fir 2 passed
fir latency: 2804
start time 1542938
LA mm 1 started
Received: 62
Received: 68
Received: 74
Received: 80
Received: 62
Received: 68
Received: 74
Received: 80
Received: 62
Received: 68
Received: 74
Received: 80
Received: 62
Received: 68
Received: 74
Received: 80
LA mm 2 passed
MM latency: 3000
Congrats
1787888
```
Comparing to original firmware execution, the latency improved up to **20** times.