# [Design Practice Sharing] Part III – Handling Alignment and Cascade in Stream Converters
In hardware data movers, alignment isn’t always perfect — especially when handling AXI-MM transfers that start at unaligned addresses or contain only partial data.
This article shares how I tackled unaligned and narrow transfers while building a **Sparse to Continuous Stream Converter IP**, designed for PCIe DMA scenarios like Xilinx CPM QDMA.
---
## 🚧 The Challenge
AXI-MM interfaces allow transactions to begin at arbitrary byte offsets and send partial beats.
But AXI-Stream expects **fully aligned, gap-free, in-order data**.
So what happens when:
- Data starts at byte 1 instead of byte 0?
- You only receive 3 valid bytes instead of 4?
- The last transfer has just 2 bytes?
You need **alignment handling** and **data cascade logic**.
---
## 🧠 Design Breakdown
To solve this, I implemented:
### 🔹 Partial Buffer
Stores partial input data that cannot yet form a complete AXIS beat.
### 🔹 Shift & Merge Logic
Shifts incoming valid bytes into alignment with the output LSB (Byte0).
Merges with buffered data from previous beats.
### 🔹 Valid Byte Counter
Tracks accumulated valid bytes to determine when a full beat is ready.
---
## 🔄 Behavior Summary
1. **First beat**:
If start offset ≠ 0, shift data to align with LSB.
2. **Intermediate beats**:
Merge new data with buffered data from the previous beat.
3. **When ≥ N bytes collected**:
Output a full AXIS beat.
4. **Final beat**:
If < N bytes remain, output partial data and set `axis_invalid_cnt`.
## 🧩 Data Merge Implementation Strategy
Two candidate architectures are proposed for merging and aligning sparse data into a continuous stream:
### Option 1: Dual Register Merge Strategy
- Utilizes two N-bit registers (e.g., 64-bit): Reg A and Reg B.
- Reg A holds the first beat, right-shifted by the invalid byte count.
- Reg B holds the next beat, left-shifted by the number of valid bytes in Reg A.
- The output is generated by merging Reg A and Reg B (`Reg A | Reg B`).
- Remaining bytes from Reg B can be retained for the next merge cycle.
**Pros:**
- Simpler control for small widths.
- Per-beat management is intuitive.
**Cons:**
- Requires dual-register coordination.
- Shift logic must be aware of valid byte counts per beat.
### Option 2: Wide Shift Register Strategy
- Uses a single 2×N-bit register (e.g., 128-bit for 64-bit data width).
- Each new input beat is placed in the upper half.
- Previous data is shifted to the lower half automatically.
- A global right shift is applied based on the first beat's invalid byte count.
- The lower half then represents the final merged aligned output.
**Pros:**
- Compact implementation with centralized shift logic.
- Easier to pipeline and scale.
**Cons:**
- Consumes more registers.
- May stress timing paths at higher data widths.
## 🖼️ Example Diagram
### Transaction from Sparse to Continuous

### Dual Register Merge Strategy

### Wide Shift Register Strategy

---
## ✅ Key Takeaways
- **Alignment always starts from Byte0** on AXIS output.
- **Order must be preserved** across beat boundaries.
- **Cascading** allows data to be carried forward until complete.
This mechanism ensures smooth AXI-Stream flow even when source data is sparse, misaligned, or narrow.
---
#AXIStream #IPDesign #DataMover #FPGA #StreamingArchitecture #QDMA #UnalignedTransfer #SystemDesign #RTL