# Queue Overload - Testing
## Navigation
1. [Problem](https://hackmd.io/@jwdunne/HJXyhNY4h)
2. [Observability](https://hackmd.io/@jwdunne/S1pJ1CgHn)
3. [Testing](https://hackmd.io/@jwdunne/H1zKkAeSn)
4. [Throughput optimisation opportunities](https://hackmd.io/@jwdunne/H1h2k0xH3)
5. [Backpressure](https://hackmd.io/@jwdunne/B1WZeCeBh)
6. [Load shedding](https://hackmd.io/@jwdunne/BJB4MReH2)
7. [Autoscaling](https://hackmd.io/@jwdunne/Bkw_zAxHn)
## Solution
Simulating queue overload as a "fire-drill" would test our mechanisms, taking inspiration from Netflix's Chaos Monkey. This could be performed once a month and/or as and when required.
A “busywork machine” could generate $N$ jobs at a variable rate that occupies a worker for a variable amount of time over a fixed timespan. This should be enough to trigger:
* Alerts
* Overloading mechanisms
* Overloaded mechanisms
* Auto-scaling
We would set a configurable upper limit on how long the process generates jobs for so that it doesn't cause hours of disruption.
One busywork job could be designed to occupy a worker indefinitely, or cause it to terminate.
This would tell us whether our mechanisms are working, whether they need fine-tuning or if we have missed implementing these mechanisms on new code.
## Interface
This should be a CLI command:
```bash
php artisan leadflo:busywork [--overload] [--timeout=seconds] [--kill-worker]
```
By default, it would timeout in 15 minutes or until the queue is in an 'overloading' state. The optional `--overload` option would instead work until the queue is in an overloaded state. The `--timeout` option provides the ability to set a longer or shorter timeout. The `--kill-worker` option kills a worker instead of occupying it for a length of time.
This command will:
1. Dispatch one command that is intended to occupy a single worker for the timeout duration (or kill it outright)
2. Continuously dispatch commands that occupy a worker for one to four seconds randomly until the desired state
Between iterations, the command will wait for a random number of seconds between 0 and a maximum of the job queued rate at the start of the process.
## Commands
Each "busywork" command would accept the time started and the timeout value.
There will be an `OccupyWorker` command. This will have a boolean property `kill` that is `false` by default. By default, it will occupy a worker until the timeout. If `kill` is true, the command will generate a string so big that it causes the OOM killer to kill the worker.
There will also be a `SwarmWorker` command. By default, this will occupy a worker for a specified time `workTimeout`. If the current time is greater than `globalTimeout + startTime`, then the job will be ignored.
## Implementation
- Implement `OccupyWorker` command and receiver
- Implement `SwarmWorker` command and receiver
- Implement `leadflo:busywork` CLI command