# Worker Pull Model for Provers
## Push Model (current system)
Currently, the worker pushes prover tasks to an available provers in a [ProverPool](https://github.com/NebraZKP/worker-ts/blob/develop/src/utils/proverPool.ts).
The drawbacks of this approach are twofold:
1. The set of provers is fixed when the `worker` process is instantiated. Moreover the worker must know fine-grained details about the provers such as IP addresses, ports, etc.
2. New provers cannot be (easily) dynamically added to auto-scale proving capacity.

## Pull Model (overview)
To remedy these issues, we propose a pull model where the `worker` does not need to a priori know about the set of provers. Rather `prover_task_queue`s sits in between the `worker` and the `provers`. The `worker` pushes prover tasks onto these queues and provers pull jobs, as per their convenience and respond with proofs.

## RabbitMQ RPC Pattern Overview
RabbitMQ is a message broker. This means that it allows publishers to send messages to a queue and allows consumers to consume them. In our system, the worker is the publisher and the various provers are the consumers.
We will use the pattern from the [RPC tutorial](https://www.rabbitmq.com/tutorials/tutorial-six-javascript) on RabbitMQ's website for our system as well. In the RPC pattern:
- Clients (publishers) to make requests to servers (consumers) by pushing (enqueing) tasks onto an `rpc_queue`.
- Each request has a unique `correlation_id` that identifies it.
- Servers then pull (dequeue) tasks from the queue, complete the tasks, and push the response back to an anonymous, exclusive callback queue *on the client*. What this means is that each request has its own callback queue which waits for a response with a particular `correlation_id`. On the other hand, the `rpc_queue` is shared amongst, clients and servers. So multiple workers can fulfill tasks from a single `rpc_queue`.

## Using RabbitMQ in our Worker System
Let's take generating `outer` proofs as a running example ([code](https://github.com/NebraZKP/worker-ts/blob/develop/src/aggregation-pipeline/outerProofGenerator.ts)).
Currently the pushed-based system works as follows:
- `outer` prover tasks are pushed onto the `OuterProofParams` async queue.
- When the `process` function is called on an element of this async queue, an async request is made to the `outerProverPool` via:
```typescript!
await this.proverPool.request(outerProverInput)
```
- Under the hood, this requests queries for an available prover, pushes a task to it, and waits for the response.
The new pull-based system will work as follows:
- When the `process` function is called, prover task request will be pushed onto `prover_rpc_queue` (see discussion about RabbitMQ RPC pattern in above section).
- Any `outer` prover can pull a prover task, complete it and push it back to the corresponding callback queue.
- The `process` function awaits until the `callback` queue for this request gets a response. Once it does, it continues with its business logic.
Advantages of this approach are:
- The `worker` needs to know absolutely nothing about the `outer` prover (IP addresses, ports, etc.). It just needs to know the endpoint of the `prover_rpc_queue`.
- `outer` provers can be spun up at any point (even after the `worker` process starts) and start pulling tasks from the `prover_rpc_queue`. Each task on the `prover_rpc_queue` comes with a `replyTo` field, which contains info about the callback queue to which the response needs to be pushed.

Note we shaded the callback queues blue, because they exist within the worker process.
## Robustness to Worker Restarts
If the `worker` restarts two things will happen, there will still be jobs on the `prover_rpc_queues`. Responses to these tasks may confuse the restarted worker. To naively, address this we can run a script which just purges the queue (see https://stackoverflow.com/questions/5313027/how-do-i-delete-all-messages-from-a-single-queue-using-the-cli)
## Robustness to Provers Dying and RabbitMQ Process Going Down
RabbitMQ has a concept of message durability and persistence. The first allows task to be returned to the queue if a consumer dies. The second allows tasks to be saved to disk so that if the RabbitMQ process dies itself, tasks on the queue are not lost.
