Use Octane - HackMD

# Use Octane Tasks to switch over to Octane: - [ ] Install Octane package - [ ] Install Octane config file - [ ] Install OpenSwoole locally - [ ] Install OpenSwoole in the base docker image - [ ] Publish a new version of base Docker image - [ ] Update API image to use new base Docker image version - [ ] Update API image to run Octane - [ ] Update API proxy to reverse HTTP proxy - [ ] Design how file watching works in a sane Dockerized way - [ ] Figure out what parts of the codebase need to change - [ ] Set `OCTANE_HTTPS` to true in staging environment - [ ] Set `OCTANE_HTTPS` to true in production environment - [ ] Set `PROXY_RESOLVER` to `127.0.0.1` in production/staging environment - [ ] Remove PHP-FPM from docker base image - [ ] Change released API proxy in API ECS side-cars Octane breaks multi-tenancy by introducing race conditions so we would ened to redesign how this works before switching it on. ## Multi-tenancy in a Non-blocking I/O environment For purposes of brevity, request/job will be called a message. The controller actions, receivers and listeners will all be called 'handlers'. In effect, a 'message' is a message from the outside world and a 'handler' handles said message from the outside world. Problems: * Octane is a non-blocking Laravel server that keeps the application in memory * Leadflo's multi-tenancy solution relies on global state * Octane combined with LF multi-tenancy leads to intermittent failures due to race conditions * It appears that client scope is being reset between a request is received and when multi-tenant services are consumed * In addition, with multiple users, this means a request could be dangerously switched to a different client scope in a similar way This means that multi-tenancy must be redesigned so that: 1. Each request/job is scoped to a particular client 2. All the resources used in the fulfilment of a request are scoped to a client 3. Use of stateful singletons are eliminated unless the state of those singletons has been designed to handle mulitple tenants concurrently 4. We do not require a large amount of parameter drilling of context There are a few means in which a message is scoped to a client: 1. JWT token for user-facing HTTP endpoints (aka the API) 2. A client ID URL parameters for integration-facing HTTP endpoints (aka webhooks) 3. A client ID property on commands / events (aka the bus) A subset of handlers operate within the context of a tenant exclusively. If they receive a message without tenant-identification information, this is an error. In addition, handlers may (and often do) form a tree of dependencies, of which some are expected to operate within the context of a tenant. They too would have similar dependencies. A subset of leaves of this dependency tree would represent resources scoped to a particular client context: 1. Client information, for example the client entity object/settings 2. Database, for example scoped via row-level security 3. Filesystem, for example scoped via S3 prefix along with an IAM 4. Volatile memory, for example scoped via key prefix 5. API client, for example scoped by Gmail OAuth2 token With regards to API client, this also depends on the context of integration. The Outlook integration does not need a Gmail client. These depends are injected into the constructor of the dependent. This is something that should be retained. Right now, we only support scoping of the client object and scoping the DB connection. Outstanding questions: * How do we scope a DB connection to a client context? * How do we scope objects to a client context? * How do we ensure an objects dependencies are scoped to a client context? The answer to all 3 questions must work within a non-blocking environment. Ultimately, the trickiest and most important question is injecting scoped dependencies. Producing a single object or DB connection scoped within a client context is pretty trivial but the use of a DI framework complicates matters. We could implement a `Context` type like this: ```php interface ClientContext { public function client(): Client; public function database(): DatabaseManager; } class NullContext implements ClientContext { public function client(): never { throw new Error("Missing client scope"); } public function database(): never { throw new Error("Missing client scope"); } } ``` But should we force: * Support to depend on `Client`, which seems wrong architecturally * And should we force all code to transitively depend on Client context? I mean it does anyway * Maybe this should live in `Client` * But given we already do this and the scope of the initiative is purely to improve multi-tenancy and introduce Octane, I'm sure this is okay. Tenant-aware objects would: 1. Not be singletons 2. Have a `NullContext` by default 3. Be responsible for passing the context down to dependencies This means that: 1. Each handler is scoped to a particle tenant 2. There is no use of singleton-causing race conditions 3. There is immediate feedback when context has not been passed Passing of context could be abstracted away in future. The context is resolved via HTTP, the bus, events or queries. One idea I had was to create an interface that tenant-context scoped modules use: ```php= interface ClientScoped { public function scope(ClientContext $ctx): void; } ``` This would then have a `TClientScoped` trait which provides the mechanisms to pass scope along to dependencies. To prototype, we can: 1. Use reflection to fetch dependencies from properties 2. Filter those dependencies that implement `ClientScoped` 3. Pass scope to those implementing dependencies This would work all the way down to leaves that depend on `DatabaseManager`, where that dependency will be removed and use the DB from the passed context. If performance from use of reflection becomes a concern, we can implement some form of caching mechanism that's run once on deploy. We could possibly hook into Laravel dependency injection to build this class and it would, in effect, build a cached dependency tree of client-scoped objects for faster look up. The only outstanding problem would be how and where we build the context and pass it to the first dependency, which would be different depending on message types: * **API Requests** would use **JWT strategy** and would need to scope controllers, which is then passed on to events/commands/queries * **Integration Requests** would use the **URL parameter strategy** and would need to scope controllers, which is then passed on to events/commands/queries * **Commands** would use the **property strategy** and would need to scope receivers * **Events** would use the **property strategy** and would need to scope listeners * **Queries** would have be scoped as dependencies because we manually dispatch queries to their resolvers Another interesting but non-critical gain in productivity would be the creation of messages from: 1. The scope, e.g automatically scoping a command to dispatch within the client context 2. Other messages, e.g the conversion of a client-scoped request into a client-scoped query, event or command Outstanding questions: * How do we scope controllers from JWT tokens in middleware? * One idea might be to scope the request, using some kind of child type, and then provide a mechanism to scope from that request in controllers. * How do we scope controllers from URL parameters or manually? * We could provide a general integration middleware that creates a scoped request from the URL parameter * How do we scope receivers from commands in the bus? * And how do we do it so that receivers remain unit testable? * Receivers could be treated as a "scoped dependency" and the queue could scope them * For queued receivers, this would happen in the LegacyJob * How do we scope listeners from events in events? * How do we scope asynchronous listeners? * And synchronous listeners? * Should we be passing the DB around or should we build a connection from context? * No, we probably shouldn't be passing a DB connection around where it's not needed The other issue is that using singletons and binding the client scope to them on all requests requires all services that listen for client are instantiated on every request, along with all of their dependencies. The trickiest problem here would be scoping receivers whilst maintaining unit testability. ## Controllers Controllers are the user-facing entrypoint into the API. The vast majority of HTTP calls will necessarily be scoped to a specific client context. For security and privacy reasons, we guarantee guard rails around the client's data where they can only read and write their data. Another set of entrypoints are endpoints for the client's integrations. Those too will operation within a specific client context and must only read and write data within that context. ### JWT-scoped Right now, scoping a user-facing HTTP request to a client's context involves: 1. Intercepting client-scoped endpoints with an auth middleware 2. Verifying the JWT token 3. If the JWT token is verified, the `ClientBound` event is dispatched This works within the typical PHP request model but comes with problems: 1. All 'client-scope aware' services must be singletons 2. All client-scope aware services are instantiated to receive the scope This is a heavy and concurrency-safe operation. Instead, we must some how modify the request: 1. Intercept client-scope endpoints with auth middleware 2. Verify the JWT token 3. Pass along an augmented `ClientRequest` that includes the context In controllers, we can then provide support code to initiate scoping of dependencies. Though this would not work if we're using `FormRequest` ### URL-scoped ... ### Manually-scoped ... ## Receivers If the receiver only accepted the scope/context, this would place the burden on the queueing mechanism. The advantage of this is that receivers remain unit testable and little needs to change in the majority of them. The disadvantage is that we need to be aware of this behaviour when migrating to a different queueing mechanism. Receivers would have to implement `ClientScoped` along with `TClientScoped`. Questions: * What changes have to be made to scope receivers? * What changes have to be made for the queue to scope receivers? * Are there any receivers called directly? * Are there any receivers that use the database within context? ## Resolvers Resolvers use the DB. These would need a mechanism to take a client context and return a DatabaseManager or similar. They would receive their context from as a dependency of whatever was calling it. * How does a resolver obtain a DB connection from a context? * ## Listeners ### Synchronous Listeners ... ### Async Listeners ... ## Resolving DB-connection from Scope ... ## Scoping infrastructure ### Queue ### Events