IAM Login Module - Technical Documentation

# IAM Login Module - Technical Documentation > **Audience**: CTO / Technical Leadership > **Last Updated**: 2026-01-19 > **Status**: Implementation Complete --- ## What is this document about? This document explains how the login feature in Detective V3 is designed. We will cover: 1. **Why did V3 need to redesign the login flow?** — What problems in V2 needed to be fixed 2. **What are the different login scenarios?** — Three different login paths 3. **How does each scenario work?** — Complete API call sequences 4. **Key security designs** — Brute-force protection, Token management, and other details --- ## 1. Why Did V3 Need a Redesign? ### 1.1 Problems with V2 V2 (the PHP version) had several problems that led us to redesign: **Problem 1: 2FA verification could be bypassed** Here's how V2 worked: After a user entered their credentials, if the account required 2FA, the backend would return a sessionId and a "2FA required" flag. But the problem was, **the backend had already issued the Token at this point**. What does this mean? If a frontend developer forgot to check the "2FA required" field, or if someone intentionally bypassed the frontend, they could use that Token to call other APIs directly. 2FA was effectively useless. ``` V2's problematic flow: 1. User enters credentials 2. Backend validates, returns { sessionId, token, requires2FA: true } ↑ This token is already usable! 3. Frontend "should" check requires2FA and redirect to 2FA page 4. But if frontend doesn't check, the token works anyway ← Problem here ``` **Problem 2: Firebase Functions don't have a fixed IP** V2 was deployed on a server with a fixed IP address. Domainarium has IP whitelist settings, and only that server could call their APIs. But V3 uses Firebase Functions (Serverless architecture), where the IP changes with each execution. We can't add all of Google Cloud's IPs to the whitelist (that would be the same as having no whitelist at all). **Problem 3: Session management was too complex** V2 used Laravel JWT, requiring us to handle Token expiration, Refresh Token, Blacklist logic ourselves. This code is error-prone and time-consuming to maintain. --- ### 1.2 How V3 Solves These Problems | Problem | V2's Approach | V3's Solution | |---------|---------------|---------------| | 2FA can be bypassed | Token issued after password validation | **Token delayed until 2FA is complete** | | IP whitelist | Fixed IP server connects directly | **Route through Go Proxy** | | Complex sessions | Write JWT logic ourselves | **Use Firebase Auth, let Google handle it** | We'll explain each solution in detail below. --- ## 2. What Does the System Architecture Look Like? First, let's look at a diagram to understand the overall architecture: ``` User (Browser/App) │ │ HTTPS ▼ ┌─────────────────────────────────────────────────┐ │ Detective V3 (Firebase Functions) │ │ │ │ This is our code │ │ - Login logic (LoginCommand) │ │ - 2FA verification (Verify2FACommand) │ │ - User data stored in Firestore │ └──────────────────────┬──────────────────────────┘ │ │ Encrypted connection (AES-256-GCM) ▼ ┌─────────────────────────────────────────────────┐ │ Go Proxy (deployed at 185.xxx.xxx.xxx) │ │ │ │ This server's IP is on Domainarium's whitelist │ │ Responsible for forwarding requests │ │ Also manages System Token (auto-refresh every 20 min) │ └──────────────────────┬──────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────┐ │ Domainarium API │ │ │ │ IPTwins' main system │ │ Stores user credentials, 2FA secrets, brand data │ └─────────────────────────────────────────────────┘ ``` ### Why Do We Need Go Proxy? Simply put, because Firebase Functions don't have a fixed IP. Each time a Firebase Function is triggered, it may run on a different Google Cloud machine with a different IP. Domainarium's `validate_user` API (which validates user identity) has IP whitelist restrictions — only IPs on the list can call it. Our solution: Rent a small server with a fixed IP, add that IP to the whitelist, and have Detective route through this server to call Domainarium. The program running on this server is "Go Proxy". Go Proxy does three things: 1. **Forward requests**: Detective sends requests to the Proxy, which forwards them to Domainarium 2. **Encrypted transmission**: Communication between Detective and Proxy uses AES-256-GCM encryption to prevent sensitive data (passwords) from being intercepted 3. **Manage System Token**: Domainarium's API requires a "System Token" to call, and this Token expires every 30 minutes. The Proxy automatically refreshes it every 20 minutes and stores it in memory --- ## 3. Three Login Scenarios After a user enters their username and password, we ask Domainarium: "Who is this user? Do they need 2FA?" Domainarium returns a `user_2FA` field, and this field's value determines what happens next: | `user_2FA` value | What it means | What to do next | |------------------|---------------|-----------------| | **A Base32 string** e.g., `"JBSWY3DPEHPK3PXP"` | This user has set up 2FA; this string is their secret key | Request verification code from Authenticator App | | **Empty string `""`** | This user doesn't need 2FA | Issue Token directly, login complete | | **The literal `"QR"`** | This user is assigned to use 2FA but hasn't scanned a QR Code yet | Generate QR Code for them to scan, issue Token after verification | The detailed flow for each scenario is explained below. --- ## 4. Scenario 1: No 2FA Required, Direct Login This is the simplest scenario. The user's account doesn't have 2FA enabled. After validating credentials, we issue the Token directly. ### Complete Flow **Step 1: User submits credentials** ``` Frontend → Backend POST /auth/login { "email": "alice@example.com", "password": "mypassword123" } ``` **Step 2: Backend asks Domainarium "Who is this person?"** After receiving the credentials, the backend doesn't validate them itself (because we don't store passwords). Instead, it sends the credentials to Go Proxy, which forwards them to Domainarium. Domainarium checks if the credentials are correct, and if so, returns user data: ``` Domainarium returns: { "user_id": 12345, // User ID on Domainarium's side "user_email": "alice@example.com", "user_2FA": "", // Empty string = No 2FA required "portfolios": ["brand-A", "brand-B"] // Brands this user can access } ``` **Step 3: Backend creates or updates user data** If this is the user's first time logging into Detective, we automatically create an account for them (this is called JIT Provisioning - Just-In-Time creation). When creating: - Based on the `portfolios` field, create the corresponding brands (if they don't exist yet) - Assign this user a default role (`ClientAccount`, regular customer) If the user already exists, we update their list of accessible brands. **Step 4: Issue a Firebase Token to the user** All validations passed. We use Firebase Admin SDK to generate a "Custom Token" containing the user's permission information (role, accessible brands, etc.). ``` Backend → Frontend HTTP 200 OK { "firebaseToken": "eyJhbGciOiJSUzI1NiIsInR5cCI6...", "requires2FA": false } ``` After receiving this Token, the frontend uses Firebase SDK's `signInWithCustomToken()` to exchange it for the actual ID Token. All subsequent API calls use this ID Token. ### Why Use Firebase Token? You might ask: Why add an extra layer of Firebase instead of issuing JWT ourselves? Reasons: 1. **No need to handle Refresh Token ourselves**: Firebase SDK automatically refreshes Tokens in the background; frontend doesn't need to write any code 2. **No need to validate Tokens ourselves**: Backend just needs one line `verifyIdToken(token)` to validate, no custom decryption logic needed 3. **No need to handle logout ourselves**: Calling `revokeRefreshTokens(userId)` invalidates the user's Token 4. **Security handled by Google**: Token encryption and signing are all handled by Google, more reliable than doing it ourselves --- ## 5. Scenario 2: 2FA Verification Required This user has previously set up 2FA (scanned a QR Code, has it in their Authenticator App). They need to enter a 6-digit verification code when logging in. ### Complete Flow **Steps 1 and 2: Same as Scenario 1** Same credential submission, same query to Domainarium. The difference is that Domainarium's returned `user_2FA` is not empty: ``` Domainarium returns: { "user_id": 12345, "user_email": "bob@example.com", "user_2FA": "JBSWY3DPEHPK3PXP", // This is this user's TOTP secret key "portfolios": ["brand-C"] } ``` **Step 3: Backend creates a "pending verification session"** Since 2FA hasn't been verified yet, we **cannot issue a Token**. But we need to remember "this user's credentials have been verified, waiting for their 2FA code". So we create a PendingSession and store it in Firestore: ``` PendingSession { id: "pending_abc123", // This session's ID userId: "user_xyz", // Corresponding user ID userEmail: "bob@example.com", totpSecret: "JBSWY3DPEHPK3PXP", // TOTP secret for verification expiresAt: 5 minutes from now // This session expires in 5 minutes } ``` **Step 4: Tell frontend "2FA required"** ``` Backend → Frontend HTTP 202 Accepted // 202 means "received, but not yet complete" { "pendingSessionId": "pending_abc123", "requires2FA": true } ``` Note: **No firebaseToken is given** at this point. The user now has no Token and cannot call any other APIs. This is the biggest difference between V3 and V2: **Token is delayed until 2FA verification is complete**. **Step 5: User opens Authenticator App and enters 6-digit code** Frontend displays an input field for the user to enter the 6-digit number shown in Google Authenticator (or other TOTP App). **Step 6: Frontend sends verification code to backend** ``` Frontend → Backend POST /auth/2fa/verify { "pendingSessionId": "pending_abc123", "code": "123456" } ``` **Step 7: Backend verifies if this code is correct** We retrieve the `totpSecret` stored earlier from PendingSession, then verify using the TOTP algorithm: ```typescript import { verify } from 'otplib' const isValid = verify({ token: "123456", // User-entered code secret: pendingSession.totpSecret // Secret we stored }) ``` How TOTP works: Both the App and server know the same secret key. Based on "the current time" and this secret, they can calculate a 6-digit number. As long as the time difference is within 30 seconds, the calculated numbers should match. **Step 8: Verification successful, now we issue the Token** ``` Backend → Frontend HTTP 200 OK { "firebaseToken": "eyJhbGciOiJSUzI1NiIsInR5cCI6..." } ``` At the same time, we: - Delete the PendingSession (it's been used) - Write an Audit Log (recording this user's successful login) ### What if the verification code is wrong? Return 401 Unauthorized. The user needs to re-enter. If they fail too many times, the account gets locked (see "Brute-force Protection" section later). --- ## 6. Scenario 3: First-time 2FA Setup This user has been assigned to use 2FA (possibly by an admin), but hasn't set it up yet. Domainarium returns `"QR"` meaning "they need to scan a QR Code to set up their Authenticator App first". ### Complete Flow **Steps 1 and 2: Same as before** ``` Domainarium returns: { "user_id": 12345, "user_email": "charlie@example.com", "user_2FA": "QR", // This special value means "2FA setup required" "portfolios": ["brand-D"] } ``` **Step 3: Create PendingSession (secret field left empty)** ``` PendingSession { id: "pending_xyz789", userId: "user_abc", userEmail: "charlie@example.com", totpSecret: "", // Leave empty; will be filled after user scans QR Code expiresAt: 5 minutes from now } ``` **Step 4: Tell frontend "2FA setup required"** ``` HTTP 202 Accepted { "pendingSessionId": "pending_xyz789", "requires2FASetup": true // Note: Setup, not verification } ``` **Step 5: Frontend requests QR Code** Frontend sees `requires2FASetup: true`, displays "Please set up 2FA" screen, then calls: ``` POST /auth/2fa/setup { "pendingSessionId": "pending_xyz789" } ``` **Step 6: Backend generates new TOTP secret and QR Code** ```typescript import { generateSecret, generateURI } from 'otplib' // Generate a Base32 format secret const secret = generateSecret() // e.g., "JBSWY3DPEHPK3PXP" // Generate QR Code content (it's a URL) const qrCodeUrl = generateURI({ issuer: 'Detective', label: 'charlie@example.com', secret }) // Result: "otpauth://totp/Detective:charlie@example.com?secret=JBSWY3DPEHPK3PXP&issuer=Detective" ``` At the same time, we store this secret in PendingSession: ``` PendingSession { id: "pending_xyz789", totpSecret: "JBSWY3DPEHPK3PXP", // Now has a value ... } ``` **Step 7: Return QR Code to frontend** ``` HTTP 200 OK { "qrCodeUrl": "otpauth://totp/Detective:charlie@example.com?secret=...", "secret": "JBSWY3DPEHPK3PXP" // Also return text version for manual entry } ``` Frontend converts this URL to a QR Code image for display. **Step 8: User scans QR Code with Authenticator App** After scanning, their App will have a new entry "Detective - charlie@example.com", showing a 6-digit verification code that changes every 30 seconds. **Step 9: User enters verification code to confirm setup was successful** ``` POST /auth/2fa/confirm { "pendingSessionId": "pending_xyz789", "code": "654321" } ``` **Step 10: Backend verifies code and saves secret back to Domainarium** After verification passes, we need to save this secret to Domainarium (because next time they log in, Domainarium needs to return this secret to us): ``` Backend → Go Proxy → Domainarium Call user_qr API { "email": "charlie@example.com", "secret": "JBSWY3DPEHPK3PXP" } ``` **Why store the secret in Domainarium instead of Detective?** Because Domainarium is the "Single Source of Truth" for user data. If users use both V2 and V3, or if there's a V4 in the future, storing the secret in one place prevents confusion. Also, this approach means Detective doesn't need to permanently store sensitive data. The secret only exists temporarily in PendingSession during the login flow, and is deleted once login completes. **Step 11: Issue Token, login complete** ``` HTTP 200 OK { "firebaseToken": "eyJhbGciOiJSUzI1NiIsInR5cCI6..." } ``` --- ## 7. Logout Flow Logout is relatively simple, but there's one detail worth explaining. ### Flow ``` Frontend → Backend POST /auth/logout Authorization: Bearer <user's Firebase ID Token> ``` The backend does two things: 1. **Revoke Refresh Token**: Calls Firebase's `revokeRefreshTokens(userId)`. This invalidates the user's Refresh Token, preventing them from obtaining new ID Tokens. 2. **Write Audit Log**: Records that this user logged out. ### Important Note After the Refresh Token is revoked, **the user's current ID Token is still valid** for up to 1 hour (this is Firebase's design). If we need to invalidate the Token "immediately", we can add the `checkRevoked: true` option when verifying Tokens, but this adds an extra database query for every verification. We currently don't do this. --- ## 8. Brute-force Protection If someone tries to log in repeatedly with wrong passwords or verification codes, we need to stop them. ### Rules | Setting | Value | |---------|-------| | Maximum failed attempts | 5 times | | Lock duration after 5 failures | 15 minutes | | What happens during lockout | Direct rejection, regardless of credentials | ### Technical Challenge: Race Condition Firebase Functions is Serverless, and many instances may run simultaneously. Suppose someone rapidly sends 10 incorrect login attempts: ``` Instance A: Read user (failedAttempts = 0) Instance B: Read user (failedAttempts = 0) Instance C: Read user (failedAttempts = 0) ... Instance A: failedAttempts = 0 + 1 = 1, save Instance B: failedAttempts = 0 + 1 = 1, save Instance C: failedAttempts = 0 + 1 = 1, save ``` The final `failedAttempts` is 1, not 3. The attacker can try infinitely. ### Solution: Atomic Increment Instead of "read → add one → write back", we directly tell the database "add one for me and tell me the current count": ```typescript // Firestore await doc.update({ failedLoginAttempts: FieldValue.increment(1) }) // PostgreSQL UPDATE users SET failed_attempts = failed_attempts + 1 WHERE id = '...' RETURNING failed_attempts ``` This way, even if multiple instances execute simultaneously, each increment actually adds one. ### Event-Driven Architecture When login fails, we don't handle lockout logic directly in LoginCommand. Instead, we emit a "login failed" event: ``` LoginCommand detects wrong password ↓ Publish LoginFailedEvent ↓ OnLoginFailedCheckLockoutPolicy receives event ↓ 1. Use Atomic Increment to increase failure count 2. If >= 5, lock the account 3. Write Audit Log ``` Why separate it this way? Because "login" and "brute-force protection" are two different concerns. This design is easier to test and maintain. --- ## 9. Token Lifecycle Finally, let's summarize the validity period and management approach for various tokens: | Token Type | Validity | Who Manages It | Notes | |------------|----------|----------------|-------| | Firebase Custom Token | One-time | We generate, discard after use | Used to exchange for ID Token; useless after exchange | | Firebase ID Token | 1 hour | Firebase SDK auto-refresh | Frontend uses this to call APIs | | Firebase Refresh Token | Very long (weeks) | Firebase SDK | Used to get new ID Tokens | | Domainarium User Token | ~30 minutes | **We don't manage it** | Use once during login then discard, same as V2 | | Domainarium System Token | ~30 minutes | Go Proxy refreshes every 20 min | Required to call Domainarium APIs | | PendingSession | 5 minutes | We manage | Used for 2FA verification, deleted after verification | ### Why Don't We Maintain Domainarium User Token? You might think: Domainarium gives us a User Token after successful login. Why don't we store it and refresh when it expires? We confirmed with the PHP Team: V2 doesn't do this either. After login completes, subsequent calls to Domainarium APIs (like getting brand data) all use System Token; User Token isn't needed. So we can discard the User Token after using it. --- ## 10. JIT Provisioning (Just-In-Time User Creation) If a user is logging into Detective for the first time, we automatically create an account for them. No admin needs to manually create it beforehand. ### Flow 1. **Validate credentials**: First confirm Domainarium recognizes this user 2. **Create brands**: Based on the `portfolios` this user can access, create corresponding brands (if they don't exist yet) 3. **Create user**: Create a user record in Detective, linked to these brands 4. **Assign role**: Give this user a default role `ClientAccount` ### Role Mapping V2 and V3 have different role names. Here's the mapping: | V2 Role | V3 Role | Description | |---------|---------|-------------| | `admin` | `SystemAdmin` | Highest privileges | | `technical` | `SystemAdmin` | Merged into SystemAdmin (same permissions in V2) | | `manager` | `ServiceManager` | Manage accounts and tasks | | `client` | `ClientAccount` | Regular customer (default role for JIT creation) | | `user` | `ExternalPartner` | External partner, minimal permissions | --- ## 11. Quick Reference Table | Question | Answer | |----------|--------| | Where is 2FA secret stored? | Domainarium (Detective only temporarily stores in PendingSession) | | How long before PendingSession expires? | 5 minutes | | How long before login Token expires? | Firebase ID Token 1 hour, but SDK auto-refreshes | | How many failed logins before lockout? | 5 times | | How long is the lockout? | 15 minutes | | What role is assigned to new users? | ClientAccount | | Is the secret saved to Domainarium on first 2FA setup? | Yes, after verification code is confirmed | --- ## 12. Go Proxy Deployment Plan ### Current Status The IAM login module code is mostly complete. However, Go Proxy hasn't been deployed to production yet. We're temporarily using **Mock Mode** to simulate Domainarium responses. What is Mock Mode? It means Go Proxy doesn't actually call Domainarium but returns fake data instead. This allows us to continue developing other features (like brand management, monitoring tasks, etc.) without being blocked by Domainarium's IP restrictions. ### Why Deploy Early Although Mock Mode allows development to continue, we recommend **deploying to production as early as possible** because: 1. **Verify architecture feasibility**: Confirm Go Proxy works properly in a real environment without network, permissions, or performance issues 2. **Discover problems early**: If deployment encounters issues, there's more time to fix them 3. **Integration testing**: Testing with real Domainarium data is more reliable than fake data ### Deployment Requires Three Steps | Step | Description | Owner | |------|-------------|-------| | **1. Deploy Go Proxy** | Choose a GCP service for deployment, e.g.: • **Cloud Run**: Simplest, but IP not fixed, requires NAT • **App Engine (GAE)**: Also requires NAT setup • **Compute Engine (GCE)**: Can have fixed External IP | Us | | **2. Set up fixed IP** | If using Cloud Run or GAE, need to configure **Cloud NAT** to get a fixed Outbound IP. If using GCE, can directly assign a Static External IP. | Us | | **3. Add to whitelist** | Give the fixed IP to Eric and ask him to add it to Domainarium API's whitelist. | Eric (IPTwins) | ### Deployment Options Comparison | Option | Pros | Cons | Estimated Cost | |--------|------|------|----------------| | **Cloud Run + Cloud NAT** | Serverless, auto-scaling, easy deployment | NAT setup somewhat complex, NAT has extra fees | ~$10-30/month | | **App Engine + Cloud NAT** | Stable, has version management | More complex setup than Cloud Run | ~$10-30/month | | **Compute Engine (GCE)** | Easiest to get fixed IP, full control | Need to manage server ourselves, no auto-scaling | ~$5-20/month (small machine) | ### Recommended Approach 1. First use **GCE smallest spec** (e2-micro) to deploy a Go Proxy, directly get a fixed IP, verify the entire flow works 2. After confirming it works, if traffic increases, evaluate whether to switch to Cloud Run + Cloud NAT ### Action Items - [ ] Decide on deployment platform (Cloud Run / GAE / GCE) - [ ] Deploy Go Proxy to GCP - [ ] Set up fixed IP (Cloud NAT or Static IP) - [ ] After getting fixed IP, contact Eric to add to whitelist - [ ] Disable Mock Mode, test with real Domainarium --- ## Related Documents For more detailed design decision records, refer to: - ADR-012: Why use Firebase Auth - ADR-013: V2 → V3 Role Mapping Strategy - ADR-014: Why not maintain Domainarium User Token - ADR-015: First-time 2FA Setup Flow - ADR-016: Token Revocation on Logout These documents are in `.context/decisions.md`.