Redis Assessment

# Redis Assessment > **Context**: Internal system, <100 users, 2-3 production instances, session affinity enabled --- ## Current State | Component | Storage | Shared? | |-----------|---------|---------| | Spring Cache (20+) | ConcurrentHashMap | No | | AI Jobs | In-memory only | No | | Async Futures | ConcurrentHashMap | No | | Business Processes | Oracle DB | Yes | | **Session Affinity** | Load Balancer | ✅ Enabled | --- ## Session Affinity (Already Implemented) Session affinity routes the same user to the same instance, which mitigates: - ✅ AI Jobs "not found" (same user → same instance) - ✅ Async Futures "not found" (same user → same instance) Session affinity does NOT solve: - ❌ Cache duplication (each instance still has own cache) - ❌ Rolling deployments (instance replaced → state lost) - ❌ Instance failure (user re-routed → state lost) - ❌ Scale up/down (user may be re-routed → state lost) - ❌ Uneven load distribution (active users cluster on same instances) - ❌ Network/device changes (user IP changes → may lose affinity) - ❌ Cache eviction coordination (can't evict across all instances) --- ## Remaining Issues ### Cache Duplication Each instance caches independently: - Same data cached 2-3 times across instances - 24-hour full cache clear (no per-key TTL) - Memory usage multiplied by instance count ### Gemini Cache Registry Mismatch Each instance has its own local registry of Gemini cached content: - Instance A creates Gemini cache → stores in local registry - Instance B doesn't know about it → may create duplicate cache - Result: Duplicate Gemini cache creation, increased costs ### Deployments / Scale Events ``` User starts job → Instance A Deployment/scale event → Instance A replaced or removed User polls → Routed to different instance → "not found" ``` --- ## Options ### Option 1: Keep Current (Session Affinity) - Works for most normal operations - Issues during deployments, instance failures, or scale up/down - Cache duplication acceptable for current scale ### Option 2: Add Redis - Shared cache across instances - Cache changes (evict/warm-up) affect all instances at once - Per-key TTL (vs current 24-hour full cache clear) - Async state sync across VMs - Gemini cache registry sync (avoid duplicate cache creation) - Survives deployments, failures, and scale events - $24/month (GCP Memorystore) --- ## Questions to Consider 1. Are deployment-time "not found" errors acceptable? 2. Is cache duplication across instances a concern? 3. Could future features benefit from shared state infrastructure? 4. Is $24/month justified for current and potential future value? --- ## Cost | Item | Monthly | |------|---------| | Current (session affinity) | $0 | | Redis (1GB Basic × 2 environments) | $24 | **Note**: Spring Boot has built-in Redis support (`spring-boot-starter-data-redis`). Integration requires minimal code - mainly configuration. --- ## Related Docs - `CURRENT_CACHE_IMPLEMENTATION.md` - `PROCESS_JOB_STATE_PERSISTENCE.md` - `REDIS_USE_CASES.md`