# 🔧 Lessons Learned: Using a Req/Ack Handshake Protocol for CPU-Accessible Control & Status Registers (CSR) - PART II
In Part I, we discussed the basics of using a `req/ack` handshake protocol to access hardware control/status registers. However, using a **1-cycle pulse `ack`** can be problematic when the initiator is a CPU, especially when polling from software.
This article presents a robust solution: using an `ack_counter` to **track request completions** reliably without relying on precise timing.
---
## ❓ The Problem: CPU May Miss the 1-Cycle `ack`
In typical req/ack implementations:
- The target module asserts `ack` for **only one clock cycle**
- Software (CPU) polls `ack` periodically
- If the CPU **misses the pulse**, it may block forever
---
## ✅ The Solution: Use an `ack_counter`
Instead of relying on `ack` pulses, we expose a **monotonically increasing counter** (`ack_counter`) that increments **every time an ack event occurs**.
### 🔧 Interface Signals
| Signal | Direction | Description |
|----------------|-------------------|-------------|
| `req` | CPU → Target | Asserted by CPU to initiate request (held high) |
| `ack_counter` | Target → CPU | Counter incremented on each `ack` event |
| `ack_data` | Target → CPU | Optional data returned on read completion |
---
## 🔁 CPU Polling Flow Example
```c
// Step 1: Read current ack_counter value
uint8_t prev_ack = read(ACK_COUNTER);
// Step 2: Assert req to start the transaction
write(REQ, 1);
// Step 3: Poll until ack_counter changes
while (read(ACK_COUNTER) == prev_ack);
// Step 4: Ack observed → clear req
write(REQ, 0);
// Step 5 (Optional): Read returned data
uint32_t result = read(RDATA);
```
This guarantees that the software never misses the ack, even if it arrives asynchronously.
## 💡 Advantages of ack_counter
* ✅ Safe for CPU/software polling — no risk of missing an ack
* ✅ Works in mixed-clock or bus-based systems (e.g., APB, AXI-lite)
* ✅ Can carry additional context (e.g., sequence ID, error status)
* ✅ Can be extended to support out-of-order or buffered responses
### 📈 Timing Diagram – Real Example
The diagram below illustrates the `req/ack` protocol with a persistent `cpu_req` signal and an `cpu_ack_cnt` mechanism.
- `cpu_req` is held high until the CPU observes `cpu_ack_cnt` change (from 0 → 1)
- `module_ack` is a 1-cycle pulse triggered by internal completion
- `module_ack_data` is written in response (e.g., `"pass"`)
- `cpu_data` holds the received value, and `cpu_ack_cnt` increments

- Wavedrom Style
``` json
{signal: [
{name: 'cpu_clk', wave: 'P........', period:3},
{name: 'module_ready', wave: '01...|...', period:3},
{name: 'cpu_req', wave: '0.1..|..0', period:3},
{name: 'cpu_req_cmd', wave: 'x.6..|..x',data:["CMD"], period:3},
{name: 'cpu_data', wave: 'x....|5..',data:["pass"], period:3},
{name: 'cpu_ack_cnt', wave: '2....|5..',data:[0,1], period:3},
{},
{name: 'module_clk', wave: 'P.............', period:2},
{name: 'module_req', wave: '0...1...0.....', period:2},
{name: 'module_cmd', wave: 'x...6...x.....',data:["CMD"], period:2},
{name: 'module_ack', wave: '0......10.....', period:2},
{name: 'module_ack_data', wave: 'x......2x.....',data:["pass"], period:2},
]}
```
* NOTE:
* I use the different clock domain crossing to imply the case.
* We can easily isolate the clock domain with this method.
* We also can use interrupt to solve the question. I just give another thinking for solving the issue.
## 🧠 Summary
By replacing a fragile 1-cycle ack pulse with a persistent ack_counter, we:
* Eliminate the risk of software missing the acknowledgment
* Make the handshake CPU-friendly and timing-agnostic
* Enable debug visibility into how many requests were acknowledged
This simple enhancement significantly improves the robustness and observability of your hardware-software interface.
---
#RTLdesign #HandshakeProtocol #CPUInterface #SoCIntegration #FPGA #CSRdesign #EmbeddedSystems #VerificationFriendly