> The system is not instantaneously consistent, but eventually consistent, and must be traceable and repairable
## Step 6: Eventual Consistency Architecture — Traceable and Repairable System Design
### Core Principle: Why Do We Accept "Eventual Consistency"?
In flash sale systems, **instantaneous consistency** (all databases update simultaneously) causes:
- After Redis deducts inventory, MySQL synchronous write → slow, blocks user response
- Payment callback waits for MySQL transaction to complete → timeout, retry storms
- Any link failure → entire flow rolls back, poor user experience
**Eventual consistency** advantages:
- **Fast response**: Redis deducts inventory successfully and immediately returns, user doesn't need to wait for MySQL write
- **Traffic shaping**: High concurrency requests enter Kafka queue, slowly consumed, database won't be overwhelmed
- **Fault tolerance**: If a link fails (e.g., MySQL downtime), messages remain in Kafka, system continues processing after recovery
**But the cost is:**
- Redis says "inventory deducted", but MySQL may not have written the order yet
- Payment succeeded, but order status may still be `PENDING` (Kafka consumption delay)
- **Without tracking and repair mechanisms, data will be permanently inconsistent!**
### Step 7: How Do We Achieve "Traceable and Repairable"?
#### **1. Every Link Records Status + Unique ID**
| Component | Recorded Content | Purpose |
|---|---|---|
|**Redis**|`stock:{skuId}` current inventory|Real-time deduction, but unreliable (crash loss)|
|**Kafka**|Message includes `messageId`, `orderId`, `timestamp`|Replayable, traceable, prevents duplicate consumption|
|**MySQL orders**|`status`, `created_at`, `updated_at`|Order state machine, can query "stuck at which step"|
|**MySQL payments**|`transaction_id`, `status`, `callback_time`|Third-party payment reconciliation, retry records|
|**inventory_audit**|`operation`, `quantity`, `ref_id`, `timestamp`|**Append-only audit log**, all inventory changes can be backtracked|
**Example:**
```sql
-- inventory_audit table design
CREATE TABLE inventory_audit (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
sku_id INT NOT NULL,
operation ENUM('reserve', 'decrease', 'release', 'refund'),
quantity INT NOT NULL,
ref_id VARCHAR(50), -- order_id or payment_id
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
**Why is it important?**
- An order's inventory was deducted, but payment failed → query `inventory_audit` to find `reserve` record → compensation mechanism releases inventory
- After Redis crashes, MySQL data still exists → can rebuild Redis inventory from `inventory_audit`
#### **2. State Machine Design: Make Every State Traceable**
```
Order state machine:
PENDING (order placed successfully, waiting for payment)
→ PAID (payment completed, waiting for confirmation)
→ CONFIRMED (order confirmed, preparing to ship)
→ SHIPPED (shipped)
→ DELIVERED (delivered)
→ REFUNDED (refunded)
Payment state machine:
PENDING (waiting for payment)
→ COMPLETED (payment successful)
→ FAILED (payment failed)
→ REFUNDED (refunded)
```
**Key points:**
- **Each state is a clear checkpoint** → When system gets stuck, we know "stuck at which step"
- **State transitions have rules** → Cannot jump directly from `PENDING` to `SHIPPED`, prevents data corruption
- **Payment and order status are decoupled** → `payments.status = COMPLETED` does not equal `orders.status = CONFIRMED` (Kafka consumption delay)
**How to repair:**
```
Scenario:
In normal flow:
1. User places order → Order status: PENDING
2. User payment succeeds → Payment status: COMPLETED
3. System (or Kafka consumer) updates order status from PENDING to CONFIRMED
But sometimes problems occur, such as:
- Kafka consumer crashes
- Program crashes
- Network jitter causes message loss
This results in payment succeeded, but order is still PENDING inconsistency.
```
What does the SQL statement do? It finds those: payment received → but order not updated abnormal orders.
```javascript
// Scheduled task: Check payment succeeded but order still PENDING cases
SELECT o.id, o.status, p.status
FROM orders o
JOIN payments p ON o.id = p.order_id
WHERE p.status = 'COMPLETED'
AND o.status = 'PENDING'
AND o.updated_at < NOW() - INTERVAL 5 MINUTE;
// Found anomaly → Manually trigger Kafka retry or directly update order status
```
#### **3. Idempotent Design: Prevent Duplicate Operations**
**Problem scenarios:**
- Payment callback sent repeatedly (network jitter, third-party retry)
- Kafka message duplicate consumption (consumer rebalance)
- User repeatedly clicks order button
**Solutions:**
| Scenario | Idempotency Mechanism | Implementation |
| ------------ | -------------------------- | ------------------------------------------------- |
| **Payment callback** | Check if `transaction_id` has been processed | `INSERT IGNORE INTO payments ...` or Redis `SETNX` |
| **Kafka consumption** | Check `messageId` or `orderId` | Redis `SET message:{id} 1 NX EX 3600` |
| **Inventory deduction** | Lua script atomic check | `if stock > 0 then decr else return 0` |
**Code example:**
```javascript
// Payment callback idempotent handling
async function handlePaymentCallback(orderId, transactionId) {
const lockKey = `lock:payment:${orderId}`;
const acquired = await redis.set(lockKey, '1', 'NX', 'EX', 10);
if (!acquired) { // Distributed Lock prevents simultaneous processing
console.log('Duplicate callback, ignore');
return;
}
try {
// Check if already processed, idempotent handling
const existing = await db.query(
'SELECT id FROM payments WHERE transaction_id = ?',
[transactionId]
);
if (existing.length > 0) {
console.log('This transactionId already processed');
return;
}
// Update order and payment status
await db.query('UPDATE payments SET status = "COMPLETED" WHERE order_id = ?', [orderId]);
await db.query('UPDATE orders SET status = "CONFIRMED" WHERE id = ?', [orderId]);
// Send Kafka message
await kafka.send('payment.completed', { orderId, transactionId });
} finally {
await redis.del(lockKey);
}
}
```
#### **4. Compensation Mechanism (Compensation): Repair Inconsistencies**
**Problem scenarios:**
- Redis deducts inventory successfully → Kafka message lost → MySQL has no order record
- Payment succeeded → Order status update failed (MySQL downtime) → User's money deducted, but order still `PENDING`
- User cancels order → Inventory not released back to Redis
**Solution: Outbox Pattern**
```sql
-- outbox table: Ensures message will definitely be sent
CREATE TABLE outbox (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
aggregate_id VARCHAR(50), -- order_id
event_type VARCHAR(50), -- 'order.created', 'payment.completed'
payload JSON,
processed BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- When placing order: Write orders + outbox in the same transaction
BEGIN;
INSERT INTO orders (id, user_id, total, status) VALUES (...);
INSERT INTO order_items (...);
INSERT INTO outbox (aggregate_id, event_type, payload)
VALUES (order_id, 'order.created', JSON_OBJECT('orderId', order_id, ...));
COMMIT;
-- Background task: Periodically scan outbox table, send to Kafka
SELECT * FROM outbox WHERE processed = FALSE ORDER BY created_at LIMIT 100;
-- After sending to Kafka, mark processed = TRUE
```
**Why is it effective?**
- MySQL transaction guarantee: Order write succeeds ↔ outbox record must exist
- Even if Kafka temporarily crashes, outbox messages remain in database, retry later
- **Guarantees eventual consistency: Messages will definitely be sent**
#### **Step 3: Async Task Queue: Kafka Consumer Responsibilities**
Kafka Consumer guarantees eventual consistency
- Redis deducts inventory → is "quasi-real-time inventory" (deduct first, prevent users from overselling)
- Kafka Consumer (MySQL) then **truly syncs MySQL inventory** (final confirmation inventory decreased)
- Then write audit log (traceable)
- Then mark as processed (guarantees idempotency, won't deduct twice)
| Phase | Synchronous Phase** (User Request) | Async Phase** (Kafka Consumer) |
| --- | ---------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
| What it does | Redis deducts inventory<br>(Because Redis is fast, supports high concurrency,<br>handles traffic peaks)<br>MySQL writes order status: PENDING | Kafka Consumer consumes "order.created"<br>MySQL deducts real inventory (final inventory record must be accurate)<br>Write audit log (traceable, prevents black box, enables reconciliation)<br>Mark processing complete (idempotent, avoid duplicate inventory deduction) |
| Characteristics | Fast, lightweight, absolutely no complex logic | Doesn't block user, slow processing, can retry, guarantees eventual consistency |
| Purpose | Let user immediately see "order placed successfully" | Let system's "accounts" ultimately be accurate and consistent |
```javascript
// Kafka Consumer: Consume "order.created" message
kafka.subscribe('order.created', async (message) => {
const { orderId, skuId, quantity } = message;
// Idempotency check
const processed = await redis.get(`processed:order:${orderId}`);
if (processed) return;
// Sync inventory to MySQL
await db.query(
'UPDATE skus SET stock = stock - ? WHERE id = ?',
[quantity, skuId]
);
// Write audit log
await db.query(
'INSERT INTO inventory_audit (sku_id, operation, quantity, ref_id) VALUES (?, "decrease", ?, ?)',
[skuId, quantity, orderId]
);
// Mark as processed
await redis.set(`processed:order:${orderId}`, '1', 'EX', 86400);
});
```
### Step 4: How Does the System Become "Traceable and Repairable"?
| Problem Scenario | How to Discover? | How to Repair? |
| ------------------------- | ----------------------------------------------------------- | ----------------------------------------------- |
| **Redis deducted inventory, MySQL has no order** | Query Redis `stock` vs MySQL `skus.stock` difference | From `inventory_audit` find orphan deductions, manually create order or release inventory |
| **Payment succeeded, order still PENDING** | Query `payments.status = COMPLETED` but `orders.status = PENDING` | Resend Kafka message or directly update order status |
| **Kafka message lost** | `outbox.processed = FALSE` for more than 10 minutes | Background task resend |
| **Duplicate inventory deduction** | `orders:expired_at` < now() and `payments=INCOMPLETE` | Release inventory |
| **User cancelled order, inventory not released** | Order status `REFUNDED` but Redis inventory didn't increase | Compensation task: `INCR stock:{skuId}` + write `inventory_audit` |
**Monitoring tools:**
- **Reconciliation task**: Hourly compare Redis inventory vs MySQL inventory
- **Anomaly alerts**: Orders stuck at `PENDING` for more than 30 minutes
- **Audit log queries**: `SELECT * FROM inventory_audit WHERE ref_id = 'order_123'` to view complete inventory change history
### Step 5: Summary: Cost and Benefits of Eventual Consistency
#### **What Did We Sacrifice?**
- **Instantaneous consistency**: After Redis deducts inventory, MySQL may delay a few seconds before having the order
- **Complexity**: Need state machine, idempotency, compensation, monitoring
#### **What Did We Gain?**
- **High concurrency capability**: 10,000 people flash sale, Redis responds quickly, MySQL won't be overwhelmed
- **System resilience**: Any component failure (Redis, Kafka, MySQL), system can recover
- **Traceable**: All operations have logs (`inventory_audit`, `outbox`, Kafka offset)
- **Repairable**: Through compensation mechanisms, reconciliation tasks, data will eventually be consistent
#### **Key Design Principles:**
1. **Every operation has a unique ID** (orderId, transactionId, messageId)
2. **Every state is queryable** (state machine + timestamp)
3. **Every change is recorded** (append-only audit log)
4. **Every message is idempotent** (prevents duplicate processing)
5. **Every failure is compensatable** (Outbox Pattern + scheduled tasks)
> **"The system is not instantaneously consistent, but eventually consistent, and must be traceable and repairable"**
> This is not a compromise, but the only feasible architectural choice under high concurrency.
> The key is: **When inconsistency occurs, we have the ability to discover and repair it.**