# Redis Assessment
> **Context**: Internal system, <100 users, 2-3 production instances, session affinity enabled
---
## Current State
| Component | Storage | Shared? |
|-----------|---------|---------|
| Spring Cache (20+) | ConcurrentHashMap | No |
| AI Jobs | In-memory only | No |
| Async Futures | ConcurrentHashMap | No |
| Business Processes | Oracle DB | Yes |
| **Session Affinity** | Load Balancer | ✅ Enabled |
---
## Session Affinity (Already Implemented)
Session affinity routes the same user to the same instance, which mitigates:
- ✅ AI Jobs "not found" (same user → same instance)
- ✅ Async Futures "not found" (same user → same instance)
Session affinity does NOT solve:
- ❌ Cache duplication (each instance still has own cache)
- ❌ Rolling deployments (instance replaced → state lost)
- ❌ Instance failure (user re-routed → state lost)
- ❌ Scale up/down (user may be re-routed → state lost)
- ❌ Uneven load distribution (active users cluster on same instances)
- ❌ Network/device changes (user IP changes → may lose affinity)
- ❌ Cache eviction coordination (can't evict across all instances)
---
## Remaining Issues
### Cache Duplication
Each instance caches independently:
- Same data cached 2-3 times across instances
- 24-hour full cache clear (no per-key TTL)
- Memory usage multiplied by instance count
### Gemini Cache Registry Mismatch
Each instance has its own local registry of Gemini cached content:
- Instance A creates Gemini cache → stores in local registry
- Instance B doesn't know about it → may create duplicate cache
- Result: Duplicate Gemini cache creation, increased costs
### Deployments / Scale Events
```
User starts job → Instance A
Deployment/scale event → Instance A replaced or removed
User polls → Routed to different instance → "not found"
```
---
## Options
### Option 1: Keep Current (Session Affinity)
- Works for most normal operations
- Issues during deployments, instance failures, or scale up/down
- Cache duplication acceptable for current scale
### Option 2: Add Redis
- Shared cache across instances
- Cache changes (evict/warm-up) affect all instances at once
- Per-key TTL (vs current 24-hour full cache clear)
- Async state sync across VMs
- Gemini cache registry sync (avoid duplicate cache creation)
- Survives deployments, failures, and scale events
- $24/month (GCP Memorystore)
---
## Questions to Consider
1. Are deployment-time "not found" errors acceptable?
2. Is cache duplication across instances a concern?
3. Could future features benefit from shared state infrastructure?
4. Is $24/month justified for current and potential future value?
---
## Cost
| Item | Monthly |
|------|---------|
| Current (session affinity) | $0 |
| Redis (1GB Basic × 2 environments) | $24 |
**Note**: Spring Boot has built-in Redis support (`spring-boot-starter-data-redis`). Integration requires minimal code - mainly configuration.
---
## Related Docs
- `CURRENT_CACHE_IMPLEMENTATION.md`
- `PROCESS_JOB_STATE_PERSISTENCE.md`
- `REDIS_USE_CASES.md`