# Queue Overload - Backpressure ## Navigation 1. [Problem](https://hackmd.io/@jwdunne/HJXyhNY4h) 2. [Observability](https://hackmd.io/@jwdunne/S1pJ1CgHn) 3. [Testing](https://hackmd.io/@jwdunne/H1zKkAeSn) 4. [Throughput optimisation opportunities](https://hackmd.io/@jwdunne/H1h2k0xH3) 5. [Backpressure](https://hackmd.io/@jwdunne/B1WZeCeBh) 6. [Load shedding](https://hackmd.io/@jwdunne/BJB4MReH2) 7. [Autoscaling](https://hackmd.io/@jwdunne/Bkw_zAxHn) ## Solution There are two ways we can implement backpressure in times of overload: 1. Delaying the dispatch of messages 2. Increasing the interval of heartbeat listeners ## Delaying message dispatch ### Integration messages We are able to: - Add 3 seconds of latency to Outlook webhook responses, which will prompt Outlook to throttle change notifications for 10 minutes - Reject a percentage of Gmail webhook requests to prompt push backoff which slows down push notification delivery - Add URL parameters to the webhook URL when sending messages that increase the time Twilio will wait before sending the webhook notification We should delay the response in this case so that it pushes the backpressure the integrated system. These controllers would use `QueueLoadGateway` to test if the queue is overloaded and then apply backpressure as needed. ### User-produced messages If we can dispatch messages _after_ the response is sent to the browser, we could delay these messages for 1 second: - `ReadConversation` - `PatientOpportunityChanged` - `SendSMS` - `SendEmail` The queue/event bus could be responsible for applying backpressure. The events and commands could have a method for specifying backpressure: ```php /** * Returns the amount of backpressure to apply in milliseconds. */ public function backpressure(bool $overloaded): int { return $overloaded ? 2_000 : 1_000; } ``` The default implementation would provide zero backpressure, which would have no effect. If multiple jobs and events are passed, we should apply the maximum backpressure out of all of the messages as opposed to delaying each message by backpressure. This would create a serious user experience issue. We would also need to extend `Clock` or provide an interface to make sleeping testable, so that we can use a non-sleeping version in testing. ## Increasing heartbeat intervals Another way of introducing backpressure and slowing down the production of messages is by slowing the interval in which scheduled listeners respond to heartbeat events. These listeners _typically_ accept `Heartbeat` events and then dispatch further messages on a per-client basis, meaning that their raison d'etre is to fan out messages - the more clients we have, the more fan out: - Increasing the integration sync interval to 10 to 15 minutes - Increasing the automation processing interval to 30 minutes when overloading and shedding entirely in states of overload - Increasing the refresh token interval to 20 minutes We can refactor these listeners to a `Scheduled` trait and then demand that `QueueLoadGateway` is provided as a dependency: ```php trait Scheduled { private readonly QueueLoad $queueLoad; private readonly bool $shed = false; abstract private function execute(): void; private function defaultInterval(): CronExpression { return new CronExpression('@everyFiveMins'); } private function overloadingInterval(): CronExpression { return $this->defaultInterval(); } private function overloadedInterval(): CronExpression { return $this->defaultInterval(); } public function handle(Heartbeat $heartbeat): void { if (! $this->shouldQueue($heartbeat)) { return; } $this->execute(); } public function shouldQueue(Heartbeat $heartbeat): void { $load = $this->queueLoad->get(); if ($load->overloading()) { return $heartbeat->cron($this->overloadingInterval()); } if ($load->overloaded()) { return !$this->shed && $heartbeat->cron($this->overloadedInterval()); } return $heartbeat->cron($this->defaultInterval()); } } ``` Perhaps we should allow returning crontab format and some how parse crontab. Perhaps default interval then defaults to 5 minutes or a permissive crontab that always returns true. This library should help us test cron tab and should make options for scheduling heartbeats more flexible going forward: https://github.com/dragonmantank/cron-expression. It is used by Laravel for their own scheduler. I recommend that we register some aliases: - `@everyFiveMins` - `*/5 * * * *` - `@everyTenMins` - `*/10 * * * *` - `@quarterHourly` - `*/15 * * * *` - `@halfHourly` - `*/30 * * * *` We would do this in the `Support` namespace. ## Implementation - Implement `Clock::sleep` method that accepts a number of milliseconds - Update integration controllers to apply configurable backpressure to webhook responses - Install `CronExpression` and configure aliases - Implement a `Scheduled` trait that encapsulates lengthening heartbeat intervals - Apply dynamic interval of 10-15 minutes to `ProcessSyncQueue` - Apply dynamic interval of 10-15 minutes to `StartProcessingWorkflows` - Apply dynamic interval of 30 minutes to `ScheduleRefreshTokens` - Apply dynamic interval of 2 hours to `ProcessFeedbackQueue` and make it shed load - Apply dynamic interval of 2 hours to `TimeoutSyncingIntegrations` and make it shed load - Apply dynamic interval of 2 hours to `ScheduleHealthchecks` and make it shed load