owned this note
owned this note
Published
Linked with GitHub
# Use Ocluster to schedule CB pipelines on multiple machines
For Current-Bench, we are only able to run benchmarks on one machine (autumn). To scale up and to allow Sandmark benchmarks to run on customized multicore machines, we want to use OCluster to dispatch the jobs on different workers. As a future goal, when the machines are idle, ocaml-ci can perhaps use these machines to run ci jobs.
OCluster workers are currently restricted to the task of building an image with Dockerfile/OBuilder. We want to extend their capability so that workers can run an OCurrent pipeline. In that way, we will be able to cut out the sub-pipeline _"build and run"_ of the CB pipeline and create workers specialized for that task.
Ideally, the design is fairly independent from the CB needs. It allows any subgraph of a pipeline to be extracted and run on specialized workers. The clients and workers are expected to use the OCluster pool system as a contract that the workers will evaluate their jobs with the right pipeline.
When a client submits a new job to the cluster, its worker will interpret it as an input node for its pipeline. The worker can then stream back to the client the state of the output nodes and the logs of its internal jobs. This output nodes appears as standard OCurrent values on the client.
As an example, this is how the automata looks like right now on Current-Bench, with only one repository:
![](https://i.imgur.com/Amx1dlj.png)
With the OCluster integration, the same graph will look like:
![](https://i.imgur.com/qmTYR6R.png)
The orange `ocluster-pipeline` symbolises the connection to an active worker executing the sub-pipeline. The original automata had some implicit nodes that are now explicit: Our CB database is filled progressively with the `build_job_id`, then `run_job_id` and finally the `output` is the json metrics resulting from the benchmark run. Here the worker has not yet completed the run hence the `output` is pending (orange).
The worker has the missing pipeline that was extracted with the same three outputs nodes:
![](https://i.imgur.com/Sh38rcU.png)
A known limitation is that the worker pipeline can only receive one input from the client, as it's not currently possible for the client to stream the status of multiple nodes to the workers. As you would expect, the client will cancel the job if its pipeline is not required anymore (freeing the worker to do something else); and the worker will inform the client of any failure. Also not shown in the drawing, the worker can have multiple jobs in progress.
----------------------
We would like to break the work in a few PRs to ease the review of the proposed changes:
1. Extend the OCluster protocol so that jobs description can have a different set of information than "how to build this repository". We also need to extend the OCluster logs so that workers can stream structured informations to the client about the `Current.state` of their output nodes and jobs' logs / artifacts.
Nearly done, just needs to check that it works correctly with step 3 before submitting: https://github.com/art-w/ocluster/tree/capnp-anypointer The changes are in principle backward compatible, the running schedulers, workers and clients don't need to be upgraded to interact with schedulers/workers/clients using the new protocol (as long as they don't use the extension).
2. Simplify the definition of new workers.
This will be a small refactoring to introduce a functor that splits the ocluster protocol logic from the definition of the worker specific implementation of `build`, `purge`, `update`, etc
Nearly done, needs some tiny adjustement to ease the completion of step 3.
3. Add a `Current_worker` public library that can be used to create a worker that executes a pipeline.
Remaining TODOs:
* Use the protocol modifications (of step 1) rather than marshalling inputs/outputs to strings.
* Stream jobs' logs and file artifacts from the worker to the client.
* It's not clear what it means for a worker to be done with a job. If the worker's pipeline has external sources, then its pipeline can re-trigger and produce new outcomes. So by default, the worker stays connected to the client. We plan to introduce some helper functions on the client and worker to have more control on job completion.
* Related, it's not yet clear how many jobs are really active on a worker (and if it should request more jobs from the cluster)
4. Extend the `Current_ocluster` library so that clients can submit custom jobs to their `Current_worker`, and receive the outputs/logs/etc. The remaining TODOs are the dual of step 3.
----------------------
Some ideas we could explore later:
- The client could send endpoints as part of the job description, such that the worker can communicate directly without passing its results through the cluster.
- It should be a bit easier to create workers that listen for jobs from multiple clusters.